2024-08-09 12:33:18,273 INFO [train_multi_KD3.py:1187] (2/4) Training started 2024-08-09 12:33:18,273 INFO [train_multi_KD3.py:1197] (2/4) Device: cuda:2 2024-08-09 12:33:18,280 INFO [train_multi_KD3.py:1212] (2/4) Using dtype=torch.bfloat16 2024-08-09 12:33:18,280 INFO [train_multi_KD3.py:1214] (2/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.9', 'icefall-git-branch': 'multi_KD_with_wenet', 'icefall-git-sha1': 'a6c2f7a4-dirty', 'icefall-git-date': 'Thu Aug 8 16:21:21 2024', 'icefall-path': '/xy/mnt/yangxiaoyu/workspace/icefall_multi_KD', 'k2-path': '/root/anaconda3/lib/python3.9/site-packages/k2/__init__.py', 'lhotse-path': '/root/anaconda3/lib/python3.9/site-packages/lhotse/__init__.py', 'hostname': 'NGK_xiaoyu'}, 'world_size': 4, 'master_port': 13440, 'tensorboard': True, 'num_epochs': 35, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'stop_early': True, 'use_fp16': False, 'use_bf16': True, 'share_asr': True, 'beats_loss_scale': 1.0, 'ecapa_loss_scale': 10.0, 'whisper_loss_scale': 1.0, 'whisper_cb_loss_scale': 0.01, 'repeat_librispeech': 5, 'repeat_wenetspeech': 0, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'speaker_input_idx': 2, 'whisper_dim': 1280, 'use_task_id': True, 'num_codebooks': 32, 'mvq_kd_layer_idx': -1, 'use_subsampled_output': True, 'delta_t': 6, 'full_libri': True, 'mini_libri': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_librispeech': True, 'use_wenetspeech': False, 'use_audioset': True, 'audioset_subset': 'unbalanced', 'use_voxceleb': True, 'voxceleb_subset': 'vox2', 'use_fma': False, 'fma_subset': 'large', 'manifest_dir': PosixPath('data/fbank_LSVoxAs_with_whisper_large-v3_with_taskID'), 'max_duration': 1500, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': False, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'large-v3', 'use_mert': False, 'blank_id': 0, 'vocab_size': 500, 'dtype': torch.bfloat16, 'use_amp': True} 2024-08-09 12:33:18,280 INFO [train_multi_KD3.py:1216] (2/4) About to create model 2024-08-09 12:33:18,696 INFO [model_shift.py:142] (2/4) Delta_t: 6 when computing the distillation loss 2024-08-09 12:33:18,702 INFO [train_multi_KD3.py:1220] (2/4) Number of model parameters: 66484678 2024-08-09 12:33:20,519 INFO [train_multi_KD3.py:1235] (2/4) Using DDP 2024-08-09 12:33:22,045 INFO [kd_datamodule.py:690] (2/4) About to get train 960 cuts 2024-08-09 12:33:22,107 INFO [train_multi_KD3.py:1306] (2/4) Getting audioset cuts 2024-08-09 12:33:22,107 INFO [kd_datamodule.py:900] (2/4) About to get the audioset cuts for KD. 2024-08-09 12:33:22,110 INFO [kd_datamodule.py:869] (2/4) About to get the voxceleb cuts. 2024-08-09 12:33:22,111 INFO [kd_datamodule.py:880] (2/4) Adding voxceleb2 cuts. 2024-08-09 12:33:22,116 INFO [train_multi_KD3.py:1320] (2/4) Using mux to combine Librispeech: True, WenetSpeech: False, audioset: True and voxceleb: True 2024-08-09 12:33:30,938 INFO [train_multi_KD3.py:1322] (2/4) Using mux to combine [CutSet(len=1406195) [underlying data type: ], CutSet(len=1904746) [underlying data type: ], CutSet(len=1187704) [underlying data type: ]] 2024-08-09 12:33:30,938 INFO [train_multi_KD3.py:1323] (2/4) Using weights: [1406195, 1904746, 1187704] 2024-08-09 12:33:30,938 INFO [train_multi_KD3.py:1332] (2/4) CutSet(len=4498645) [underlying data type: ] 2024-08-09 12:33:30,938 INFO [kd_datamodule.py:449] (2/4) Disable MUSAN 2024-08-09 12:33:30,940 INFO [kd_datamodule.py:489] (2/4) Disable SpecAugment 2024-08-09 12:33:30,940 INFO [kd_datamodule.py:491] (2/4) About to create train dataset 2024-08-09 12:33:30,941 INFO [kd_datamodule.py:528] (2/4) Using SimpleCutSampler 2024-08-09 12:33:30,941 INFO [kd_datamodule.py:536] (2/4) About to create train dataloader 2024-08-09 12:33:30,944 INFO [kd_datamodule.py:763] (2/4) About to get dev-clean cuts 2024-08-09 12:33:30,945 INFO [kd_datamodule.py:781] (2/4) About to get dev-other cuts 2024-08-09 12:33:30,947 INFO [kd_datamodule.py:570] (2/4) About to create dev dataset 2024-08-09 12:33:31,217 INFO [kd_datamodule.py:591] (2/4) About to create dev dataloader 2024-08-09 12:33:31,218 INFO [kd_datamodule.py:840] (2/4) About to get the test set of voxceleb1 set. 2024-08-09 12:33:31,220 INFO [kd_datamodule.py:570] (2/4) About to create dev dataset 2024-08-09 12:33:31,440 INFO [kd_datamodule.py:591] (2/4) About to create dev dataloader 2024-08-09 12:33:31,440 INFO [kd_datamodule.py:912] (2/4) About to get the audioset eval cuts. 2024-08-09 12:33:31,448 INFO [kd_datamodule.py:570] (2/4) About to create dev dataset 2024-08-09 12:33:31,884 INFO [kd_datamodule.py:591] (2/4) About to create dev dataloader 2024-08-09 12:33:31,884 INFO [train_multi_KD3.py:1412] (2/4) ['ASR_libri', 'SV_voxceleb1', 'AT_audioset'] 2024-08-09 12:33:47,536 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 0, loss[loss=1.156, beats_loss=0.7352, ecapa_loss=0.002322, whisper_loss=0.3971, over 17084.00 frames. ], tot_loss[loss=1.156, beats_loss=0.7352, ecapa_loss=0.002322, whisper_loss=0.3971, over 17084.00 frames. ], batch size: 67, lr: 2.25e-02, grad_scale: 2.0 2024-08-09 12:33:47,537 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-09 12:34:33,667 INFO [train_multi_KD3.py:1149] (2/4) Epoch 1, validation on ASR_libri: loss=0.9193, beats_loss=0, ecapa_loss=0.006113, whisper_loss=0.8581, over 922467.00 frames. 2024-08-09 12:34:48,323 INFO [train_multi_KD3.py:1149] (2/4) Epoch 1, validation on SV_voxceleb1: loss=0.05055, beats_loss=0, ecapa_loss=0.005055, whisper_loss=0, over 939242.00 frames. 2024-08-09 12:36:49,493 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9313, 6.0865, 6.0643, 6.0285], device='cuda:2') 2024-08-09 12:36:59,584 INFO [train_multi_KD3.py:1149] (2/4) Epoch 1, validation on AT_audioset: loss=1.752, beats_loss=1.752, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 12:36:59,586 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-09 12:37:00,410 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=65.05 vs. limit=7.5 2024-08-09 12:37:01,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=7.5 2024-08-09 12:37:07,107 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 32 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 12:37:07,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=0.0, ans=0.5 2024-08-09 12:37:29,444 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=98.58 vs. limit=7.5375 2024-08-09 12:38:04,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=200.0, ans=0.049375 2024-08-09 12:38:09,613 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 12:38:09,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=200.0, ans=0.298 2024-08-09 12:38:12,057 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 12:38:17,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=300.0, ans=0.8895000000000001 2024-08-09 12:38:20,129 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=349.92 vs. limit=7.725 2024-08-09 12:38:27,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=58.72 vs. limit=4.12 2024-08-09 12:38:27,213 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=90.04 vs. limit=7.6125 2024-08-09 12:38:30,104 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=136.57 vs. limit=7.725 2024-08-09 12:38:30,134 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.95 vs. limit=4.12 2024-08-09 12:38:45,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=400.0, ans=5.25 2024-08-09 12:38:48,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=400.0, ans=0.04875 2024-08-09 12:39:04,963 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 50, loss[loss=0.2456, beats_loss=0.0139, ecapa_loss=0.002138, whisper_loss=0.2103, over 15591.00 frames. ], tot_loss[loss=0.3339, beats_loss=0.1287, ecapa_loss=0.001922, whisper_loss=0.186, over 903276.75 frames. ], batch size: 56, lr: 2.48e-02, grad_scale: 2.0 2024-08-09 12:39:09,662 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-09 12:39:12,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=500.0, ans=0.4765625 2024-08-09 12:39:14,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=500.0, ans=0.08875000000000001 2024-08-09 12:39:30,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=600.0, ans=0.294 2024-08-09 12:39:54,399 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=192.58 vs. limit=7.7625 2024-08-09 12:39:56,740 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=228.34 vs. limit=8.025 2024-08-09 12:39:58,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=700.0, ans=0.243 2024-08-09 12:40:02,633 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=211.12 vs. limit=7.7625 2024-08-09 12:40:12,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=800.0, ans=0.17 2024-08-09 12:40:14,727 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=197.23 vs. limit=8.1 2024-08-09 12:40:23,922 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 25 from LS+wenet, 9 from Vox, 23 fro AS 2024-08-09 12:40:29,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=800.0, ans=0.082 2024-08-09 12:40:33,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=900.0, ans=0.4578125 2024-08-09 12:40:34,403 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 12:40:39,877 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 12:40:51,515 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.930e+01 4.445e+01 8.118e+01 2.890e+03, threshold=8.890e+01, percent-clipped=0.0 2024-08-09 12:40:51,543 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 100, loss[loss=0.1887, beats_loss=0.02712, ecapa_loss=0.001484, whisper_loss=0.1468, over 23600.00 frames. ], tot_loss[loss=0.2647, beats_loss=0.07043, ecapa_loss=0.001864, whisper_loss=0.1756, over 1542053.55 frames. ], batch size: 92, lr: 2.70e-02, grad_scale: 4.0 2024-08-09 12:40:58,945 WARNING [optim.py:496] (2/4) Scaling gradients by 0.048358626663684845, model_norm_threshold=88.8975601196289 2024-08-09 12:40:59,115 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.4.encoder.layers.2.norm.log_scale with proportion 0.88, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.987e+06, grad_sumsq=2.987e+06, orig_rms_sq=1.000e+00 2024-08-09 12:41:01,751 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=78.31 vs. limit=7.875 2024-08-09 12:41:17,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1100.0, ans=0.4484375 2024-08-09 12:41:18,010 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=442.59 vs. limit=7.9125 2024-08-09 12:41:23,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1100.0, ans=0.4484375 2024-08-09 12:41:25,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1100.0, ans=0.3625 2024-08-09 12:41:31,693 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-09 12:41:34,601 WARNING [optim.py:496] (2/4) Scaling gradients by 0.011974900029599667, model_norm_threshold=88.8975601196289 2024-08-09 12:41:34,761 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.96, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.313e+07, grad_sumsq=5.313e+07, orig_rms_sq=1.000e+00 2024-08-09 12:41:37,648 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=188.87 vs. limit=7.95 2024-08-09 12:41:39,291 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=14.59 vs. limit=5.3 2024-08-09 12:41:42,511 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=55.47 vs. limit=8.4 2024-08-09 12:41:45,673 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.18 vs. limit=8.475 2024-08-09 12:41:46,866 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=16.84 vs. limit=4.52 2024-08-09 12:41:47,141 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=140.01 vs. limit=7.9875 2024-08-09 12:41:51,727 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=4.52 2024-08-09 12:41:53,504 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=24.95 vs. limit=8.475 2024-08-09 12:42:10,396 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=252.98 vs. limit=8.55 2024-08-09 12:42:15,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=258.67 vs. limit=8.625 2024-08-09 12:42:15,730 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 150, loss[loss=0.1729, beats_loss=0.02152, ecapa_loss=0.001661, whisper_loss=0.1347, over 20092.00 frames. ], tot_loss[loss=0.2372, beats_loss=0.05022, ecapa_loss=0.001836, whisper_loss=0.1686, over 2047074.14 frames. ], batch size: 77, lr: 2.93e-02, grad_scale: 4.0 2024-08-09 12:42:18,182 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=305.28 vs. limit=8.0625 2024-08-09 12:42:22,408 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=280.71 vs. limit=8.0625 2024-08-09 12:42:23,786 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=14.47 vs. limit=4.6 2024-08-09 12:42:28,156 WARNING [optim.py:496] (2/4) Scaling gradients by 0.04562794789671898, model_norm_threshold=88.8975601196289 2024-08-09 12:42:28,329 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.64, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.426e+06, grad_sumsq=2.426e+06, orig_rms_sq=1.000e+00 2024-08-09 12:42:28,809 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-09 12:42:32,281 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=327.51 vs. limit=8.1 2024-08-09 12:42:35,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=112.60 vs. limit=8.7 2024-08-09 12:42:39,419 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-09 12:42:47,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1700.0, ans=5.85 2024-08-09 12:42:59,146 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=114.70 vs. limit=8.1375 2024-08-09 12:43:00,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1700.0, ans=0.4203125 2024-08-09 12:43:12,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1800.0, ans=0.415625 2024-08-09 12:43:23,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1900.0, ans=0.4109375 2024-08-09 12:43:25,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1900.0, ans=0.4109375 2024-08-09 12:43:26,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1900.0, ans=0.28099999999999997 2024-08-09 12:43:34,317 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+01 2.774e+01 3.640e+01 5.016e+01 7.424e+03, threshold=7.280e+01, percent-clipped=13.0 2024-08-09 12:43:34,337 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 200, loss[loss=0.1581, beats_loss=0.02148, ecapa_loss=0.001821, whisper_loss=0.1184, over 21479.00 frames. ], tot_loss[loss=0.2246, beats_loss=0.03966, ecapa_loss=0.001845, whisper_loss=0.1665, over 2425419.82 frames. ], batch size: 87, lr: 3.15e-02, grad_scale: 8.0 2024-08-09 12:43:35,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=49.75 vs. limit=8.25 2024-08-09 12:43:41,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2000.0, ans=0.8300000000000001 2024-08-09 12:43:41,789 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=196.33 vs. limit=8.25 2024-08-09 12:43:54,404 WARNING [optim.py:496] (2/4) Scaling gradients by 0.06407187134027481, model_norm_threshold=72.79639434814453 2024-08-09 12:43:54,575 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.47, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.083e+05, grad_sumsq=6.083e+05, orig_rms_sq=1.000e+00 2024-08-09 12:44:13,057 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=9.15 2024-08-09 12:44:25,528 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.73 vs. limit=8.3625 2024-08-09 12:44:39,568 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=97.86 vs. limit=8.4 2024-08-09 12:44:45,322 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=290.87 vs. limit=8.4 2024-08-09 12:44:47,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2400.0, ans=0.046 2024-08-09 12:44:47,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2400.0, ans=0.3875 2024-08-09 12:44:52,232 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 250, loss[loss=0.1795, beats_loss=0.01781, ecapa_loss=0.001742, whisper_loss=0.1443, over 19316.00 frames. ], tot_loss[loss=0.216, beats_loss=0.03333, ecapa_loss=0.001836, whisper_loss=0.1643, over 2723096.34 frames. ], batch size: 75, lr: 3.38e-02, grad_scale: 8.0 2024-08-09 12:44:56,539 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.19 vs. limit=5.625 2024-08-09 12:45:02,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2500.0, ans=0.3828125 2024-08-09 12:45:02,453 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=16.45 vs. limit=5.625 2024-08-09 12:45:03,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=2500.0, ans=0.014 2024-08-09 12:45:08,774 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=19.74 vs. limit=5.65 2024-08-09 12:45:17,175 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.68 vs. limit=9.45 2024-08-09 12:45:19,563 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-09 12:45:21,705 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=33.66 vs. limit=9.45 2024-08-09 12:45:32,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2700.0, ans=0.8055 2024-08-09 12:45:34,872 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 12:45:40,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2800.0, ans=0.04125 2024-08-09 12:45:40,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2800.0, ans=0.0825 2024-08-09 12:45:41,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2800.0, ans=0.36875 2024-08-09 12:45:43,817 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=189.61 vs. limit=8.55 2024-08-09 12:45:45,265 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=31.86 vs. limit=8.55 2024-08-09 12:45:49,837 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=68.22 vs. limit=6.4 2024-08-09 12:45:52,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2800.0, ans=0.36875 2024-08-09 12:46:00,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2900.0, ans=0.035 2024-08-09 12:46:00,708 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=78.78 vs. limit=9.675 2024-08-09 12:46:02,221 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=53.58 vs. limit=8.5875 2024-08-09 12:46:06,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2900.0, ans=0.081875 2024-08-09 12:46:10,766 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 3.536e+01 4.623e+01 6.113e+01 1.136e+03, threshold=9.245e+01, percent-clipped=13.0 2024-08-09 12:46:10,786 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 300, loss[loss=0.1587, beats_loss=0.02242, ecapa_loss=0.001565, whisper_loss=0.1206, over 22313.00 frames. ], tot_loss[loss=0.2089, beats_loss=0.02929, ecapa_loss=0.001803, whisper_loss=0.1616, over 2950569.17 frames. ], batch size: 90, lr: 3.60e-02, grad_scale: 8.0 2024-08-09 12:46:17,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=141.98 vs. limit=8.625 2024-08-09 12:46:19,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3000.0, ans=0.0875 2024-08-09 12:46:20,816 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=35.57 vs. limit=8.625 2024-08-09 12:46:32,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=82.67 vs. limit=9.825 2024-08-09 12:46:34,566 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-09 12:46:38,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=67.52 vs. limit=8.6625 2024-08-09 12:46:38,586 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=23.22 vs. limit=8.6625 2024-08-09 12:46:50,482 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.98 vs. limit=8.7 2024-08-09 12:46:53,756 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=26.69 vs. limit=8.7 2024-08-09 12:46:56,452 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-09 12:46:58,355 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.816e+01 2024-08-09 12:47:06,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3300.0, ans=0.025749999999999995 2024-08-09 12:47:09,519 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.45 vs. limit=5.32 2024-08-09 12:47:10,976 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=5.32 2024-08-09 12:47:13,012 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.00 vs. limit=8.775 2024-08-09 12:47:20,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3400.0, ans=0.023499999999999993 2024-08-09 12:47:21,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3400.0, ans=0.340625 2024-08-09 12:47:23,540 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=100.08 vs. limit=8.775 2024-08-09 12:47:26,710 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=27.21 vs. limit=8.775 2024-08-09 12:47:27,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3500.0, ans=0.3359375 2024-08-09 12:47:28,124 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=20.28 vs. limit=6.75 2024-08-09 12:47:28,582 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 350, loss[loss=0.2061, beats_loss=0.01456, ecapa_loss=0.001873, whisper_loss=0.1728, over 22700.00 frames. ], tot_loss[loss=0.2031, beats_loss=0.02621, ecapa_loss=0.001767, whisper_loss=0.1592, over 3144598.01 frames. ], batch size: 93, lr: 3.83e-02, grad_scale: 8.0 2024-08-09 12:47:30,968 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=85.32 vs. limit=8.8125 2024-08-09 12:47:39,009 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=77.02 vs. limit=6.75 2024-08-09 12:47:39,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3500.0, ans=8.8125 2024-08-09 12:47:40,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3500.0, ans=0.7775000000000001 2024-08-09 12:47:40,407 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=65.88 vs. limit=6.75 2024-08-09 12:47:42,214 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=19.62 vs. limit=6.75 2024-08-09 12:47:43,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3600.0, ans=0.264 2024-08-09 12:47:48,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=62.78 vs. limit=8.85 2024-08-09 12:47:52,784 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=14.59 vs. limit=5.9 2024-08-09 12:47:55,799 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.51 vs. limit=6.8 2024-08-09 12:48:03,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3700.0, ans=0.06124999999999997 2024-08-09 12:48:07,601 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 12:48:16,331 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=46.30 vs. limit=8.925 2024-08-09 12:48:34,513 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.657e+00 2024-08-09 12:48:35,132 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.86 vs. limit=6.95 2024-08-09 12:48:40,748 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=17.55 vs. limit=8.9625 2024-08-09 12:48:43,718 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.82 vs. limit=5.5600000000000005 2024-08-09 12:48:46,238 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.862e+01 3.339e+01 4.177e+01 8.866e+01, threshold=6.678e+01, percent-clipped=0.0 2024-08-09 12:48:46,258 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 400, loss[loss=0.1854, beats_loss=0.01646, ecapa_loss=0.001299, whisper_loss=0.1559, over 17826.00 frames. ], tot_loss[loss=0.1969, beats_loss=0.02407, ecapa_loss=0.001725, whisper_loss=0.1555, over 3301352.28 frames. ], batch size: 65, lr: 4.05e-02, grad_scale: 16.0 2024-08-09 12:48:46,533 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 12:49:02,106 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-09 12:49:06,533 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-09 12:49:08,773 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=28.09 vs. limit=9.0375 2024-08-09 12:49:13,671 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=10.575 2024-08-09 12:49:39,054 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=26.99 vs. limit=10.725 2024-08-09 12:49:47,053 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=35.57 vs. limit=9.15 2024-08-09 12:49:48,388 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=26.03 vs. limit=10.8 2024-08-09 12:49:48,831 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=3.66 2024-08-09 12:49:50,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=9.15 2024-08-09 12:49:51,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4400.0, ans=0.20600000000000002 2024-08-09 12:49:56,392 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=16.41 vs. limit=9.15 2024-08-09 12:50:00,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4400.0, ans=0.256 2024-08-09 12:50:03,143 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 450, loss[loss=0.2008, beats_loss=0.01666, ecapa_loss=0.001554, whisper_loss=0.1686, over 14274.00 frames. ], tot_loss[loss=0.1933, beats_loss=0.0221, ecapa_loss=0.001706, whisper_loss=0.1541, over 3442684.73 frames. ], batch size: 55, lr: 4.28e-02, grad_scale: 16.0 2024-08-09 12:50:06,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4500.0, ans=0.2890625 2024-08-09 12:50:20,840 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.29 vs. limit=10.95 2024-08-09 12:50:23,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4600.0, ans=0.739 2024-08-09 12:50:29,908 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=12.36 vs. limit=6.15 2024-08-09 12:50:31,436 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=36.93 vs. limit=9.225 2024-08-09 12:50:51,006 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=28.26 vs. limit=9.3 2024-08-09 12:50:54,116 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=11.1 2024-08-09 12:50:58,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4800.0, ans=0.275 2024-08-09 12:51:07,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4900.0, ans=0.7285 2024-08-09 12:51:12,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4900.0, ans=0.201 2024-08-09 12:51:17,891 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 12:51:18,873 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.539e+01 2.551e+01 3.113e+01 4.254e+01 7.113e+01, threshold=6.225e+01, percent-clipped=1.0 2024-08-09 12:51:18,894 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 500, loss[loss=0.2052, beats_loss=0.01543, ecapa_loss=0.001581, whisper_loss=0.174, over 23259.00 frames. ], tot_loss[loss=0.1895, beats_loss=0.02072, ecapa_loss=0.001666, whisper_loss=0.1521, over 3536325.87 frames. ], batch size: 92, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:51:20,918 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=22.64 vs. limit=9.375 2024-08-09 12:51:22,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=5000.0, ans=8.125 2024-08-09 12:51:26,618 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=33.11 vs. limit=9.375 2024-08-09 12:51:27,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=5000.0, ans=0.265625 2024-08-09 12:51:37,617 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=76.47 vs. limit=9.4125 2024-08-09 12:51:40,601 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=29.37 vs. limit=9.4125 2024-08-09 12:51:51,505 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.69 vs. limit=11.4 2024-08-09 12:51:54,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=11.4 2024-08-09 12:52:00,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=5200.0, ans=0.0675 2024-08-09 12:52:00,625 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.31 vs. limit=7.6 2024-08-09 12:52:01,951 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=41.44 vs. limit=11.4 2024-08-09 12:52:06,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=5300.0, ans=9.4875 2024-08-09 12:52:12,644 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=24.78 vs. limit=9.4875 2024-08-09 12:52:13,127 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-09 12:52:16,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5300.0, ans=0.044583333333333336 2024-08-09 12:52:21,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=5400.0, ans=0.009695652173913044 2024-08-09 12:52:27,607 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.44 vs. limit=11.55 2024-08-09 12:52:27,679 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=20.61 vs. limit=9.525 2024-08-09 12:52:31,362 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 31 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-09 12:52:31,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=6.16 2024-08-09 12:52:34,829 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=29.18 vs. limit=9.5625 2024-08-09 12:52:35,443 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 550, loss[loss=0.1646, beats_loss=0.01452, ecapa_loss=0.001713, whisper_loss=0.1329, over 16696.00 frames. ], tot_loss[loss=0.1851, beats_loss=0.01974, ecapa_loss=0.001619, whisper_loss=0.1492, over 3623766.63 frames. ], batch size: 69, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:52:38,976 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-09 12:52:48,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5500.0, ans=0.2421875 2024-08-09 12:52:50,397 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=49.30 vs. limit=11.7 2024-08-09 12:52:51,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5600.0, ans=0.043333333333333335 2024-08-09 12:52:59,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=5600.0, ans=0.2375 2024-08-09 12:53:00,954 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=18.28 vs. limit=9.6 2024-08-09 12:53:10,737 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.18 vs. limit=9.6375 2024-08-09 12:53:11,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5700.0, ans=0.7005 2024-08-09 12:53:14,055 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=29.84 vs. limit=9.6375 2024-08-09 12:53:14,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5700.0, ans=0.23281249999999998 2024-08-09 12:53:19,029 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-09 12:53:26,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=5800.0, ans=9.675 2024-08-09 12:53:35,972 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=16.13 vs. limit=9.7125 2024-08-09 12:53:40,119 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-09 12:53:43,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5900.0, ans=0.2234375 2024-08-09 12:53:52,001 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.72 vs. limit=6.5 2024-08-09 12:53:52,179 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=26.26 vs. limit=12.0 2024-08-09 12:53:52,628 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.532e+01 2.262e+01 2.880e+01 3.640e+01 5.434e+01, threshold=5.761e+01, percent-clipped=0.0 2024-08-09 12:53:52,647 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 600, loss[loss=0.1492, beats_loss=0.01588, ecapa_loss=0.001442, whisper_loss=0.1189, over 14266.00 frames. ], tot_loss[loss=0.1829, beats_loss=0.01884, ecapa_loss=0.00159, whisper_loss=0.1482, over 3670118.44 frames. ], batch size: 56, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:53:58,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=6000.0, ans=0.21875 2024-08-09 12:53:58,548 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=16.57 vs. limit=9.75 2024-08-09 12:54:00,172 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=9.75 2024-08-09 12:54:02,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=17.48 vs. limit=9.75 2024-08-09 12:54:05,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=6000.0, ans=0.04166666666666667 2024-08-09 12:54:06,316 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.04 vs. limit=12.0 2024-08-09 12:54:07,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=6100.0, ans=0.239 2024-08-09 12:54:07,836 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=17.62 vs. limit=9.7875 2024-08-09 12:54:13,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=6100.0, ans=0.6865 2024-08-09 12:54:15,321 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.57 vs. limit=6.525 2024-08-09 12:54:30,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=6200.0, ans=0.0 2024-08-09 12:54:48,755 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 12:55:00,080 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.36 vs. limit=8.2 2024-08-09 12:55:02,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=6400.0, ans=0.2 2024-08-09 12:55:03,289 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=21.90 vs. limit=9.9 2024-08-09 12:55:10,353 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 650, loss[loss=0.1568, beats_loss=0.01454, ecapa_loss=0.001602, whisper_loss=0.1262, over 18492.00 frames. ], tot_loss[loss=0.1791, beats_loss=0.01827, ecapa_loss=0.001551, whisper_loss=0.1454, over 3677290.46 frames. ], batch size: 72, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:55:11,310 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=9.9375 2024-08-09 12:55:25,812 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=6.640000000000001 2024-08-09 12:55:35,443 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.55 vs. limit=9.975 2024-08-09 12:55:40,530 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-09 12:56:14,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=6900.0, ans=0.6585000000000001 2024-08-09 12:56:16,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=6900.0, ans=0.1765625 2024-08-09 12:56:17,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.46 vs. limit=12.675 2024-08-09 12:56:24,985 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.370e+01 2.323e+01 2.699e+01 3.837e+01 7.112e+01, threshold=5.398e+01, percent-clipped=6.0 2024-08-09 12:56:25,005 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 700, loss[loss=0.1641, beats_loss=0.01718, ecapa_loss=0.001147, whisper_loss=0.1355, over 22816.00 frames. ], tot_loss[loss=0.1761, beats_loss=0.01767, ecapa_loss=0.001512, whisper_loss=0.1433, over 3675502.12 frames. ], batch size: 89, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:56:27,376 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.68 vs. limit=12.75 2024-08-09 12:56:50,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=7100.0, ans=0.00932608695652174 2024-08-09 12:56:58,006 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-09 12:57:04,581 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-09 12:57:13,052 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 12:57:16,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=7300.0, ans=0.15781250000000002 2024-08-09 12:57:17,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=7300.0, ans=0.15781250000000002 2024-08-09 12:57:27,745 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.65 vs. limit=8.7 2024-08-09 12:57:31,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=7400.0, ans=0.153125 2024-08-09 12:57:40,093 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 750, loss[loss=0.1768, beats_loss=0.01472, ecapa_loss=0.001312, whisper_loss=0.1489, over 23399.00 frames. ], tot_loss[loss=0.1742, beats_loss=0.01706, ecapa_loss=0.001469, whisper_loss=0.1424, over 3710107.58 frames. ], batch size: 91, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:58:10,948 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=10.3875 2024-08-09 12:58:57,140 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.605e+01 2.305e+01 2.802e+01 3.610e+01 6.792e+01, threshold=5.604e+01, percent-clipped=3.0 2024-08-09 12:58:57,161 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 800, loss[loss=0.2098, beats_loss=0.01392, ecapa_loss=0.001376, whisper_loss=0.1821, over 19393.00 frames. ], tot_loss[loss=0.171, beats_loss=0.01668, ecapa_loss=0.001422, whisper_loss=0.1401, over 3746808.60 frames. ], batch size: 78, lr: 4.49e-02, grad_scale: 32.0 2024-08-09 12:58:59,669 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.70 vs. limit=13.5 2024-08-09 12:59:05,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=8000.0, ans=9.0 2024-08-09 12:59:05,283 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=10.5 2024-08-09 12:59:09,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=8000.0, ans=0.009130434782608696 2024-08-09 12:59:28,935 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.80 vs. limit=5.0 2024-08-09 12:59:41,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=8300.0, ans=0.03208333333333334 2024-08-09 12:59:41,757 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=13.725 2024-08-09 12:59:56,656 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-09 12:59:58,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=8400.0, ans=0.0 2024-08-09 12:59:58,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=8400.0, ans=0.125 2024-08-09 13:00:02,655 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.80 vs. limit=13.8 2024-08-09 13:00:07,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=8400.0, ans=0.125 2024-08-09 13:00:09,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=8400.0, ans=0.216 2024-08-09 13:00:13,427 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 850, loss[loss=0.1474, beats_loss=0.01369, ecapa_loss=0.001107, whisper_loss=0.1227, over 17133.00 frames. ], tot_loss[loss=0.1693, beats_loss=0.01624, ecapa_loss=0.001377, whisper_loss=0.1393, over 3773870.22 frames. ], batch size: 65, lr: 4.49e-02, grad_scale: 32.0 2024-08-09 13:00:23,301 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.62 vs. limit=10.6875 2024-08-09 13:00:31,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=8600.0, ans=0.125 2024-08-09 13:00:34,709 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=24.53 vs. limit=10.725 2024-08-09 13:00:39,108 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.46 vs. limit=13.95 2024-08-09 13:00:40,468 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=22.47 vs. limit=10.725 2024-08-09 13:00:43,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=8700.0, ans=0.125 2024-08-09 13:00:43,642 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.15 vs. limit=7.48 2024-08-09 13:00:45,074 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.59 vs. limit=9.35 2024-08-09 13:01:00,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=8800.0, ans=0.125 2024-08-09 13:01:09,966 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.00 vs. limit=9.4 2024-08-09 13:01:20,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8900.0, ans=0.21100000000000002 2024-08-09 13:01:23,511 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.37 vs. limit=10.8375 2024-08-09 13:01:26,639 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.500e+01 2.129e+01 2.561e+01 3.167e+01 6.018e+01, threshold=5.121e+01, percent-clipped=3.0 2024-08-09 13:01:26,660 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 900, loss[loss=0.1717, beats_loss=0.01461, ecapa_loss=0.001079, whisper_loss=0.1463, over 21786.00 frames. ], tot_loss[loss=0.1669, beats_loss=0.01598, ecapa_loss=0.00133, whisper_loss=0.1376, over 3773905.33 frames. ], batch size: 83, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:01:30,331 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=10.875 2024-08-09 13:01:33,089 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=4.35 2024-08-09 13:01:34,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9000.0, ans=0.21000000000000002 2024-08-09 13:01:37,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=9000.0, ans=9.5 2024-08-09 13:01:43,609 INFO [train_multi_KD3.py:844] (2/4) A total of 98 cuts. 25 from LS+wenet, 19 from Vox, 54 fro AS 2024-08-09 13:01:48,692 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.84 vs. limit=10.9125 2024-08-09 13:01:54,733 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-09 13:02:24,061 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 13:02:28,162 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-09 13:02:37,195 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.30 vs. limit=11.0625 2024-08-09 13:02:37,658 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 950, loss[loss=0.1772, beats_loss=0.01317, ecapa_loss=0.0009926, whisper_loss=0.1541, over 19647.00 frames. ], tot_loss[loss=0.1633, beats_loss=0.01588, ecapa_loss=0.001277, whisper_loss=0.1346, over 3747853.18 frames. ], batch size: 73, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:02:39,932 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=11.0625 2024-08-09 13:02:50,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=9600.0, ans=0.125 2024-08-09 13:02:52,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=9600.0, ans=0.125 2024-08-09 13:02:55,341 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-09 13:03:03,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=9600.0, ans=0.125 2024-08-09 13:03:07,012 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-09 13:03:17,907 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-09 13:03:19,179 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-09 13:03:19,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=9700.0, ans=0.008760869565217391 2024-08-09 13:03:35,772 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-09 13:03:42,743 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 12 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 13:03:43,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=9900.0, ans=0.125 2024-08-09 13:03:50,341 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.609e+01 2.154e+01 2.525e+01 3.011e+01 6.635e+01, threshold=5.049e+01, percent-clipped=1.0 2024-08-09 13:03:50,361 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1000, loss[loss=0.1846, beats_loss=0.01287, ecapa_loss=0.001156, whisper_loss=0.1602, over 22669.00 frames. ], tot_loss[loss=0.1616, beats_loss=0.01571, ecapa_loss=0.001223, whisper_loss=0.1337, over 3800182.49 frames. ], batch size: 90, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:04:03,780 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.05 vs. limit=11.2875 2024-08-09 13:04:03,799 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.92 vs. limit=10.05 2024-08-09 13:04:34,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=10300.0, ans=0.14700000000000002 2024-08-09 13:04:45,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=10300.0, ans=0.5395000000000001 2024-08-09 13:04:55,299 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-09 13:04:55,934 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.48 vs. limit=7.6 2024-08-09 13:04:57,360 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.73 vs. limit=7.6 2024-08-09 13:05:01,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=10400.0, ans=0.125 2024-08-09 13:05:04,064 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1050, loss[loss=0.1086, beats_loss=0.01733, ecapa_loss=0.00106, whisper_loss=0.08067, over 14443.00 frames. ], tot_loss[loss=0.1595, beats_loss=0.01544, ecapa_loss=0.001186, whisper_loss=0.1322, over 3791594.48 frames. ], batch size: 60, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:05:19,551 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 28 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-09 13:05:22,815 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 13:05:36,648 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.37 vs. limit=11.5125 2024-08-09 13:05:40,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=10700.0, ans=0.022083333333333337 2024-08-09 13:05:45,744 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=11.5125 2024-08-09 13:05:52,947 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 13:05:56,959 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-09 13:05:58,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=10800.0, ans=0.0 2024-08-09 13:06:03,062 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-09 13:06:06,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=10900.0, ans=0.125 2024-08-09 13:06:18,851 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.598e+01 2.283e+01 2.878e+01 3.739e+01 7.694e+01, threshold=5.756e+01, percent-clipped=7.0 2024-08-09 13:06:18,871 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1100, loss[loss=0.1593, beats_loss=0.0165, ecapa_loss=0.0009641, whisper_loss=0.1332, over 18756.00 frames. ], tot_loss[loss=0.1584, beats_loss=0.01533, ecapa_loss=0.001138, whisper_loss=0.1317, over 3817262.55 frames. ], batch size: 74, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:06:21,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=11000.0, ans=0.5150000000000001 2024-08-09 13:06:44,153 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.70 vs. limit=15.825 2024-08-09 13:06:52,584 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.58 vs. limit=10.6 2024-08-09 13:06:55,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11200.0, ans=0.188 2024-08-09 13:06:55,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=11200.0, ans=8.48 2024-08-09 13:06:58,248 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.01 vs. limit=7.8 2024-08-09 13:06:59,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=11200.0, ans=0.008434782608695653 2024-08-09 13:07:11,173 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=38.70 vs. limit=11.7375 2024-08-09 13:07:19,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=11400.0, ans=0.125 2024-08-09 13:07:31,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=11500.0, ans=0.008369565217391305 2024-08-09 13:07:31,811 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1150, loss[loss=0.1374, beats_loss=0.01341, ecapa_loss=0.0009527, whisper_loss=0.1144, over 19379.00 frames. ], tot_loss[loss=0.1558, beats_loss=0.01518, ecapa_loss=0.001105, whisper_loss=0.1296, over 3803990.82 frames. ], batch size: 74, lr: 4.47e-02, grad_scale: 32.0 2024-08-09 13:07:32,963 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.18 vs. limit=11.8125 2024-08-09 13:07:38,162 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=16.125 2024-08-09 13:07:42,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=11500.0, ans=0.125 2024-08-09 13:08:08,152 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.94 vs. limit=16.275 2024-08-09 13:08:12,955 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=11.8875 2024-08-09 13:08:14,866 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 13:08:18,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=11800.0, ans=0.125 2024-08-09 13:08:45,099 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.612e+01 2.329e+01 2.685e+01 3.204e+01 5.571e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-09 13:08:45,124 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1200, loss[loss=0.1323, beats_loss=0.01649, ecapa_loss=0.0009694, whisper_loss=0.1062, over 19052.00 frames. ], tot_loss[loss=0.1542, beats_loss=0.01505, ecapa_loss=0.001067, whisper_loss=0.1285, over 3798953.25 frames. ], batch size: 79, lr: 4.47e-02, grad_scale: 32.0 2024-08-09 13:08:51,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=12000.0, ans=0.125 2024-08-09 13:08:52,346 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 13:08:52,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=12000.0, ans=0.125 2024-08-09 13:08:56,122 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=12.0 2024-08-09 13:08:59,255 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-09 13:09:12,613 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 25 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-09 13:09:46,802 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.72 vs. limit=12.15 2024-08-09 13:09:52,702 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=8.96 2024-08-09 13:09:55,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=12400.0, ans=16.8 2024-08-09 13:09:58,945 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1250, loss[loss=0.1108, beats_loss=0.01609, ecapa_loss=0.0008051, whisper_loss=0.08669, over 14817.00 frames. ], tot_loss[loss=0.1528, beats_loss=0.01491, ecapa_loss=0.001036, whisper_loss=0.1276, over 3784957.95 frames. ], batch size: 57, lr: 4.47e-02, grad_scale: 32.0 2024-08-09 13:10:15,893 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-09 13:10:17,542 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-09 13:10:20,312 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 21 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-09 13:10:24,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=12600.0, ans=0.174 2024-08-09 13:10:36,885 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-09 13:10:43,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=12800.0, ans=12.3 2024-08-09 13:10:46,595 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.41 vs. limit=12.3 2024-08-09 13:10:47,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=12800.0, ans=0.125 2024-08-09 13:10:50,535 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-09 13:10:53,274 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 13:11:03,226 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.20 vs. limit=12.3375 2024-08-09 13:11:05,846 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=12.3375 2024-08-09 13:11:12,987 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.663e+01 2.459e+01 3.175e+01 4.087e+01 8.300e+01, threshold=6.351e+01, percent-clipped=6.0 2024-08-09 13:11:13,007 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1300, loss[loss=0.17, beats_loss=0.01074, ecapa_loss=0.0009422, whisper_loss=0.1498, over 22529.00 frames. ], tot_loss[loss=0.1511, beats_loss=0.01482, ecapa_loss=0.001005, whisper_loss=0.1262, over 3819998.88 frames. ], batch size: 87, lr: 4.47e-02, grad_scale: 32.0 2024-08-09 13:11:18,153 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.43 vs. limit=9.2 2024-08-09 13:11:26,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=13100.0, ans=0.125 2024-08-09 13:11:30,883 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-09 13:11:44,216 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 27 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-09 13:11:49,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=13200.0, ans=0.125 2024-08-09 13:11:56,121 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-09 13:12:02,367 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-09 13:12:12,906 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=12.525 2024-08-09 13:12:18,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=13400.0, ans=0.125 2024-08-09 13:12:24,690 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.636e-01 2024-08-09 13:12:27,001 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1350, loss[loss=0.1224, beats_loss=0.01666, ecapa_loss=0.0008946, whisper_loss=0.09681, over 20944.00 frames. ], tot_loss[loss=0.1494, beats_loss=0.01475, ecapa_loss=0.000975, whisper_loss=0.1249, over 3815342.72 frames. ], batch size: 88, lr: 4.46e-02, grad_scale: 32.0 2024-08-09 13:12:29,409 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.59 vs. limit=17.625 2024-08-09 13:12:34,610 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-09 13:12:41,412 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=12.6 2024-08-09 13:12:43,400 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-09 13:13:13,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=13800.0, ans=0.125 2024-08-09 13:13:22,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=13800.0, ans=0.025 2024-08-09 13:13:28,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=13900.0, ans=0.00875 2024-08-09 13:13:35,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=13900.0, ans=0.161 2024-08-09 13:13:40,987 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.453e+01 2.894e+01 3.668e+01 7.407e+01, threshold=5.787e+01, percent-clipped=1.0 2024-08-09 13:13:41,007 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1400, loss[loss=0.1331, beats_loss=0.01592, ecapa_loss=0.0008368, whisper_loss=0.1088, over 18083.00 frames. ], tot_loss[loss=0.1486, beats_loss=0.01469, ecapa_loss=0.0009424, whisper_loss=0.1245, over 3830520.91 frames. ], batch size: 72, lr: 4.46e-02, grad_scale: 32.0 2024-08-09 13:13:50,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=14000.0, ans=0.125 2024-08-09 13:14:02,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=14100.0, ans=10.0 2024-08-09 13:14:02,830 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.08 vs. limit=18.075 2024-08-09 13:14:04,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=14100.0, ans=0.40650000000000003 2024-08-09 13:14:22,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=14200.0, ans=0.0075 2024-08-09 13:14:29,331 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 13:14:33,785 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=9.719999999999999 2024-08-09 13:14:41,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=14400.0, ans=0.0077391304347826095 2024-08-09 13:14:54,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=14500.0, ans=0.125 2024-08-09 13:14:56,357 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1450, loss[loss=0.1266, beats_loss=0.01709, ecapa_loss=0.000724, whisper_loss=0.1023, over 18005.00 frames. ], tot_loss[loss=0.1469, beats_loss=0.01458, ecapa_loss=0.0009228, whisper_loss=0.1231, over 3812526.76 frames. ], batch size: 74, lr: 4.46e-02, grad_scale: 32.0 2024-08-09 13:14:56,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=14500.0, ans=0.0062500000000000056 2024-08-09 13:15:35,430 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=12.975 2024-08-09 13:15:35,460 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.43 vs. limit=12.975 2024-08-09 13:15:56,866 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 13:16:05,999 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.24 vs. limit=18.6 2024-08-09 13:16:08,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=14800.0, ans=0.125 2024-08-09 13:16:11,982 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-09 13:16:22,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=14900.0, ans=0.125 2024-08-09 13:16:31,848 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.675e+01 2.402e+01 3.110e+01 4.073e+01 8.821e+01, threshold=6.219e+01, percent-clipped=9.0 2024-08-09 13:16:31,873 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1500, loss[loss=0.1371, beats_loss=0.01354, ecapa_loss=0.001046, whisper_loss=0.1131, over 17257.00 frames. ], tot_loss[loss=0.1452, beats_loss=0.01454, ecapa_loss=0.0008981, whisper_loss=0.1217, over 3798771.46 frames. ], batch size: 75, lr: 4.46e-02, grad_scale: 32.0 2024-08-09 13:16:44,826 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-09 13:16:45,745 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=13.125 2024-08-09 13:16:53,571 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.99 vs. limit=12.55 2024-08-09 13:16:56,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=15100.0, ans=0.14900000000000002 2024-08-09 13:17:15,387 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 38 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-09 13:17:15,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=15200.0, ans=0.125 2024-08-09 13:17:45,104 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 13:17:49,895 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 29 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-09 13:17:52,915 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1550, loss[loss=0.1672, beats_loss=0.01247, ecapa_loss=0.000878, whisper_loss=0.1459, over 22140.00 frames. ], tot_loss[loss=0.145, beats_loss=0.01444, ecapa_loss=0.0008749, whisper_loss=0.1218, over 3828550.71 frames. ], batch size: 89, lr: 4.45e-02, grad_scale: 32.0 2024-08-09 13:17:53,954 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.97 vs. limit=19.125 2024-08-09 13:17:57,384 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.66 vs. limit=5.325 2024-08-09 13:17:58,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=15500.0, ans=0.002083333333333333 2024-08-09 13:18:02,814 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.544e-02 2024-08-09 13:18:04,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=15500.0, ans=0.125 2024-08-09 13:18:16,421 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-09 13:18:21,928 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.75 vs. limit=19.2 2024-08-09 13:18:31,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=15700.0, ans=0.007456521739130435 2024-08-09 13:18:49,206 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.02 vs. limit=19.35 2024-08-09 13:18:55,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=15900.0, ans=0.00041666666666666935 2024-08-09 13:19:00,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=15900.0, ans=0.125 2024-08-09 13:19:06,958 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.80 vs. limit=13.4625 2024-08-09 13:19:10,319 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=19.425 2024-08-09 13:19:12,750 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.449e+01 2.841e+01 3.798e+01 6.790e+01, threshold=5.683e+01, percent-clipped=3.0 2024-08-09 13:19:12,770 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1600, loss[loss=0.1214, beats_loss=0.01438, ecapa_loss=0.0008274, whisper_loss=0.09871, over 17752.00 frames. ], tot_loss[loss=0.1438, beats_loss=0.01437, ecapa_loss=0.0008507, whisper_loss=0.1209, over 3835563.75 frames. ], batch size: 73, lr: 4.45e-02, grad_scale: 32.0 2024-08-09 13:19:32,957 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-09 13:19:43,507 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 13:20:00,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=13.6125 2024-08-09 13:20:04,085 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 25 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-09 13:20:07,773 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-09 13:20:16,784 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-09 13:20:33,546 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1650, loss[loss=0.1398, beats_loss=0.01481, ecapa_loss=0.0006813, whisper_loss=0.1181, over 14452.00 frames. ], tot_loss[loss=0.1431, beats_loss=0.01441, ecapa_loss=0.000829, whisper_loss=0.1204, over 3871660.48 frames. ], batch size: 55, lr: 4.45e-02, grad_scale: 32.0 2024-08-09 13:20:49,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=16600.0, ans=0.0 2024-08-09 13:20:58,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=16600.0, ans=0.125 2024-08-09 13:21:03,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=16600.0, ans=0.125 2024-08-09 13:21:06,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=16700.0, ans=0.125 2024-08-09 13:21:08,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=16700.0, ans=0.125 2024-08-09 13:21:11,153 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-09 13:21:12,330 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.42 vs. limit=10.68 2024-08-09 13:21:14,934 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 8 from Vox, 37 fro AS 2024-08-09 13:21:16,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=16700.0, ans=0.133 2024-08-09 13:21:44,193 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-09 13:21:45,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16900.0, ans=0.131 2024-08-09 13:21:54,905 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.579e+01 3.058e+01 4.131e+01 8.941e+01, threshold=6.115e+01, percent-clipped=7.0 2024-08-09 13:21:54,934 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1700, loss[loss=0.1444, beats_loss=0.01411, ecapa_loss=0.0008051, whisper_loss=0.1222, over 21581.00 frames. ], tot_loss[loss=0.1412, beats_loss=0.01442, ecapa_loss=0.0008068, whisper_loss=0.1188, over 3870357.06 frames. ], batch size: 90, lr: 4.44e-02, grad_scale: 32.0 2024-08-09 13:22:09,067 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.26 vs. limit=13.875 2024-08-09 13:22:11,796 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.94 vs. limit=13.9125 2024-08-09 13:22:12,768 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-09 13:22:21,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=17100.0, ans=0.125 2024-08-09 13:22:29,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=17200.0, ans=0.007130434782608696 2024-08-09 13:22:38,003 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 34 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-09 13:22:38,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=17200.0, ans=0.128 2024-08-09 13:22:41,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=17300.0, ans=0.125 2024-08-09 13:22:51,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=17300.0, ans=0.127 2024-08-09 13:22:55,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=17300.0, ans=0.125 2024-08-09 13:22:59,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=16.29 vs. limit=14.025 2024-08-09 13:23:01,465 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.70 vs. limit=5.609999999999999 2024-08-09 13:23:03,748 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 13:23:13,128 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1750, loss[loss=0.1124, beats_loss=0.01552, ecapa_loss=0.0007506, whisper_loss=0.0894, over 22157.00 frames. ], tot_loss[loss=0.1399, beats_loss=0.01444, ecapa_loss=0.0007877, whisper_loss=0.1176, over 3855380.51 frames. ], batch size: 92, lr: 4.44e-02, grad_scale: 32.0 2024-08-09 13:23:15,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=17500.0, ans=14.0625 2024-08-09 13:23:21,098 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.71 vs. limit=9.375 2024-08-09 13:23:35,548 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.35 vs. limit=20.7 2024-08-09 13:23:40,869 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-09 13:23:46,369 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 11 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-09 13:23:49,355 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.19 vs. limit=14.1375 2024-08-09 13:23:56,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=17800.0, ans=0.0 2024-08-09 13:23:57,176 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 24 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-09 13:24:05,949 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.593e-01 2024-08-09 13:24:19,843 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-09 13:24:27,938 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.728e+01 3.350e+01 4.234e+01 7.677e+01, threshold=6.699e+01, percent-clipped=2.0 2024-08-09 13:24:27,974 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1800, loss[loss=0.142, beats_loss=0.01421, ecapa_loss=0.0007991, whisper_loss=0.1198, over 20083.00 frames. ], tot_loss[loss=0.1391, beats_loss=0.01438, ecapa_loss=0.0007701, whisper_loss=0.117, over 3871468.32 frames. ], batch size: 80, lr: 4.44e-02, grad_scale: 32.0 2024-08-09 13:24:34,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=18000.0, ans=0.0 2024-08-09 13:24:38,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=18000.0, ans=0.025 2024-08-09 13:24:38,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=18000.0, ans=0.125 2024-08-09 13:24:47,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=18100.0, ans=0.0 2024-08-09 13:25:01,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=18200.0, ans=0.11800000000000002 2024-08-09 13:25:01,806 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.13 vs. limit=14.325 2024-08-09 13:25:05,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=18200.0, ans=0.125 2024-08-09 13:25:09,742 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-09 13:25:11,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=18300.0, ans=10.0 2024-08-09 13:25:16,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=18300.0, ans=0.125 2024-08-09 13:25:24,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=18300.0, ans=0.125 2024-08-09 13:25:40,618 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 13:25:43,244 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1850, loss[loss=0.1632, beats_loss=0.01279, ecapa_loss=0.0007628, whisper_loss=0.1428, over 24612.00 frames. ], tot_loss[loss=0.1392, beats_loss=0.01429, ecapa_loss=0.0007603, whisper_loss=0.1173, over 3839449.69 frames. ], batch size: 94, lr: 4.43e-02, grad_scale: 32.0 2024-08-09 13:25:53,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=18500.0, ans=0.0 2024-08-09 13:25:54,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=18500.0, ans=0.11500000000000002 2024-08-09 13:26:08,447 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.96 vs. limit=5.79 2024-08-09 13:26:27,581 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.37 vs. limit=14.5125 2024-08-09 13:26:35,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=18800.0, ans=0.11200000000000002 2024-08-09 13:26:51,819 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.46 vs. limit=14.5875 2024-08-09 13:27:00,973 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.85 vs. limit=14.45 2024-08-09 13:27:03,051 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.592e+01 3.002e+01 4.008e+01 1.371e+02, threshold=6.005e+01, percent-clipped=3.0 2024-08-09 13:27:03,073 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1900, loss[loss=0.1654, beats_loss=0.009972, ecapa_loss=0.0007881, whisper_loss=0.1475, over 16023.00 frames. ], tot_loss[loss=0.1377, beats_loss=0.01427, ecapa_loss=0.0007646, whisper_loss=0.1158, over 3818303.58 frames. ], batch size: 61, lr: 4.43e-02, grad_scale: 32.0 2024-08-09 13:27:06,140 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.94 vs. limit=14.5 2024-08-09 13:27:09,048 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.24 vs. limit=14.5 2024-08-09 13:27:23,673 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 13:27:24,718 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.60 vs. limit=14.6625 2024-08-09 13:27:39,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=19200.0, ans=0.10800000000000001 2024-08-09 13:27:50,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=19300.0, ans=0.07 2024-08-09 13:28:06,751 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.60 vs. limit=14.775 2024-08-09 13:28:20,369 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 1950, loss[loss=0.1512, beats_loss=0.01383, ecapa_loss=0.0008917, whisper_loss=0.1285, over 17287.00 frames. ], tot_loss[loss=0.1376, beats_loss=0.01417, ecapa_loss=0.0007636, whisper_loss=0.1158, over 3807796.42 frames. ], batch size: 73, lr: 4.43e-02, grad_scale: 32.0 2024-08-09 13:28:21,430 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.79 vs. limit=22.125 2024-08-09 13:28:34,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.64 vs. limit=14.85 2024-08-09 13:28:34,215 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.73 vs. limit=14.85 2024-08-09 13:28:50,533 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 13:28:59,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=19700.0, ans=0.125 2024-08-09 13:29:08,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=19800.0, ans=0.0 2024-08-09 13:29:09,158 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=14.925 2024-08-09 13:29:13,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=19800.0, ans=0.0 2024-08-09 13:29:21,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=19900.0, ans=0.125 2024-08-09 13:29:23,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=19900.0, ans=0.025 2024-08-09 13:29:28,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=19900.0, ans=0.125 2024-08-09 13:29:35,102 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.629e+01 3.262e+01 3.981e+01 7.661e+01, threshold=6.525e+01, percent-clipped=2.0 2024-08-09 13:29:35,125 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2000, loss[loss=0.1291, beats_loss=0.01588, ecapa_loss=0.0008222, whisper_loss=0.105, over 21029.00 frames. ], tot_loss[loss=0.1367, beats_loss=0.01421, ecapa_loss=0.000758, whisper_loss=0.1149, over 3783999.14 frames. ], batch size: 89, lr: 4.42e-02, grad_scale: 64.0 2024-08-09 13:29:35,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=20000.0, ans=0.006521739130434783 2024-08-09 13:29:42,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=20000.0, ans=0.125 2024-08-09 13:29:44,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=20000.0, ans=0.125 2024-08-09 13:30:02,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=20100.0, ans=0.0 2024-08-09 13:30:07,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=20200.0, ans=0.035 2024-08-09 13:30:11,451 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-09 13:30:34,648 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.15 vs. limit=15.0 2024-08-09 13:30:46,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=20400.0, ans=0.125 2024-08-09 13:30:54,983 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2050, loss[loss=0.1399, beats_loss=0.0163, ecapa_loss=0.0005772, whisper_loss=0.1178, over 20039.00 frames. ], tot_loss[loss=0.1364, beats_loss=0.01426, ecapa_loss=0.0007497, whisper_loss=0.1147, over 3816207.82 frames. ], batch size: 79, lr: 4.42e-02, grad_scale: 64.0 2024-08-09 13:30:56,404 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.00 vs. limit=22.5 2024-08-09 13:31:12,660 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.611e-01 2024-08-09 13:31:15,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=20600.0, ans=0.0 2024-08-09 13:31:49,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20800.0, ans=0.1 2024-08-09 13:31:51,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=20800.0, ans=0.125 2024-08-09 13:32:03,917 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-09 13:32:08,649 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.818e+01 3.204e+01 4.044e+01 7.345e+01, threshold=6.407e+01, percent-clipped=1.0 2024-08-09 13:32:08,678 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2100, loss[loss=0.1436, beats_loss=0.01628, ecapa_loss=0.0005431, whisper_loss=0.1219, over 14420.00 frames. ], tot_loss[loss=0.136, beats_loss=0.01427, ecapa_loss=0.000741, whisper_loss=0.1143, over 3810163.45 frames. ], batch size: 55, lr: 4.42e-02, grad_scale: 64.0 2024-08-09 13:32:22,348 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.33 vs. limit=15.0 2024-08-09 13:32:23,047 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 24 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-09 13:32:26,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=21100.0, ans=0.125 2024-08-09 13:32:26,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=21100.0, ans=0.125 2024-08-09 13:32:35,487 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 13:33:15,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=21400.0, ans=0.125 2024-08-09 13:33:23,427 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=6.0 2024-08-09 13:33:25,538 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2150, loss[loss=0.1431, beats_loss=0.01133, ecapa_loss=0.000714, whisper_loss=0.1246, over 14718.00 frames. ], tot_loss[loss=0.1361, beats_loss=0.01414, ecapa_loss=0.000735, whisper_loss=0.1146, over 3812292.68 frames. ], batch size: 57, lr: 4.41e-02, grad_scale: 64.0 2024-08-09 13:33:42,811 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 12 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-09 13:34:00,938 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 12 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 13:34:12,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=21800.0, ans=0.1 2024-08-09 13:34:37,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=21900.0, ans=0.125 2024-08-09 13:34:42,166 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.673e+01 3.209e+01 4.237e+01 7.311e+01, threshold=6.417e+01, percent-clipped=1.0 2024-08-09 13:34:42,187 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2200, loss[loss=0.1174, beats_loss=0.01931, ecapa_loss=0.0005997, whisper_loss=0.0921, over 20684.00 frames. ], tot_loss[loss=0.1368, beats_loss=0.01405, ecapa_loss=0.0007284, whisper_loss=0.1155, over 3843875.73 frames. ], batch size: 83, lr: 4.41e-02, grad_scale: 64.0 2024-08-09 13:35:13,693 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 13:35:14,990 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.60 vs. limit=22.5 2024-08-09 13:35:39,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=22300.0, ans=0.125 2024-08-09 13:35:45,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=22400.0, ans=0.0 2024-08-09 13:35:49,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=22400.0, ans=0.125 2024-08-09 13:35:52,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=22400.0, ans=0.1 2024-08-09 13:36:01,358 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2250, loss[loss=0.09841, beats_loss=0.01441, ecapa_loss=0.0008457, whisper_loss=0.07554, over 18812.00 frames. ], tot_loss[loss=0.1363, beats_loss=0.01405, ecapa_loss=0.0007241, whisper_loss=0.115, over 3817336.23 frames. ], batch size: 81, lr: 4.40e-02, grad_scale: 64.0 2024-08-09 13:36:09,864 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 13:36:13,044 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-09 13:37:01,265 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=15.0 2024-08-09 13:37:09,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=22700.0, ans=0.0 2024-08-09 13:37:13,583 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-09 13:37:13,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=22800.0, ans=0.00591304347826087 2024-08-09 13:37:13,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=22800.0, ans=0.125 2024-08-09 13:37:22,075 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 13:37:22,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=22800.0, ans=0.0 2024-08-09 13:37:22,624 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.07 vs. limit=15.0 2024-08-09 13:37:29,573 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2024-08-09 13:37:45,947 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.143e+01 2.951e+01 3.575e+01 4.087e+01 9.473e+01, threshold=7.150e+01, percent-clipped=2.0 2024-08-09 13:37:45,967 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2300, loss[loss=0.1284, beats_loss=0.01413, ecapa_loss=0.0006414, whisper_loss=0.1078, over 21613.00 frames. ], tot_loss[loss=0.1359, beats_loss=0.01401, ecapa_loss=0.0007164, whisper_loss=0.1147, over 3837538.90 frames. ], batch size: 89, lr: 4.40e-02, grad_scale: 64.0 2024-08-09 13:38:09,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=23100.0, ans=0.0 2024-08-09 13:38:31,715 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.76 vs. limit=15.0 2024-08-09 13:38:37,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=23300.0, ans=0.125 2024-08-09 13:38:47,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=23300.0, ans=0.05 2024-08-09 13:38:52,061 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 14 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-09 13:39:04,924 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2350, loss[loss=0.1274, beats_loss=0.01417, ecapa_loss=0.0007033, whisper_loss=0.1062, over 14519.00 frames. ], tot_loss[loss=0.1354, beats_loss=0.01399, ecapa_loss=0.0007061, whisper_loss=0.1144, over 3808242.25 frames. ], batch size: 58, lr: 4.40e-02, grad_scale: 64.0 2024-08-09 13:39:08,715 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 30 from Vox, 24 fro AS 2024-08-09 13:39:17,317 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 27 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-09 13:39:28,705 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.72 vs. limit=22.5 2024-08-09 13:39:58,985 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.46 vs. limit=15.0 2024-08-09 13:40:02,267 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.61 vs. limit=22.5 2024-08-09 13:40:03,041 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 13:40:03,706 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.74 vs. limit=15.0 2024-08-09 13:40:07,383 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.08 vs. limit=10.0 2024-08-09 13:40:10,436 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2024-08-09 13:40:17,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=23900.0, ans=0.125 2024-08-09 13:40:23,992 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.811e+01 3.461e+01 4.504e+01 7.215e+01, threshold=6.923e+01, percent-clipped=1.0 2024-08-09 13:40:24,018 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2400, loss[loss=0.1673, beats_loss=0.01137, ecapa_loss=0.0006472, whisper_loss=0.1494, over 20398.00 frames. ], tot_loss[loss=0.1353, beats_loss=0.0139, ecapa_loss=0.0006986, whisper_loss=0.1144, over 3816485.56 frames. ], batch size: 78, lr: 4.39e-02, grad_scale: 64.0 2024-08-09 13:41:03,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=24200.0, ans=0.125 2024-08-09 13:41:08,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=24300.0, ans=0.125 2024-08-09 13:41:10,072 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.05 vs. limit=15.0 2024-08-09 13:41:12,493 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 13:41:27,269 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 31 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 13:41:31,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=24400.0, ans=0.125 2024-08-09 13:41:39,292 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2450, loss[loss=0.1465, beats_loss=0.01123, ecapa_loss=0.0007095, whisper_loss=0.1282, over 22338.00 frames. ], tot_loss[loss=0.1354, beats_loss=0.01388, ecapa_loss=0.0006832, whisper_loss=0.1147, over 3851656.95 frames. ], batch size: 87, lr: 4.39e-02, grad_scale: 64.0 2024-08-09 13:41:58,770 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 14 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-09 13:42:01,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=24600.0, ans=0.125 2024-08-09 13:42:10,065 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 13:42:11,824 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.70 vs. limit=15.0 2024-08-09 13:42:20,484 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 16 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-09 13:42:23,127 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-09 13:42:26,622 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.01 vs. limit=6.0 2024-08-09 13:42:28,842 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-09 13:42:31,559 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.13 vs. limit=15.0 2024-08-09 13:42:42,455 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-09 13:42:42,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=24900.0, ans=0.125 2024-08-09 13:42:46,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=24900.0, ans=0.2 2024-08-09 13:42:51,244 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-09 13:42:52,613 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.830e+01 3.469e+01 4.522e+01 1.002e+02, threshold=6.939e+01, percent-clipped=2.0 2024-08-09 13:42:52,633 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2500, loss[loss=0.1503, beats_loss=0.009627, ecapa_loss=0.0006343, whisper_loss=0.1343, over 15146.00 frames. ], tot_loss[loss=0.1344, beats_loss=0.01391, ecapa_loss=0.0006766, whisper_loss=0.1137, over 3836413.98 frames. ], batch size: 56, lr: 4.38e-02, grad_scale: 64.0 2024-08-09 13:43:15,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=25100.0, ans=0.1 2024-08-09 13:43:16,702 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 13:43:16,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=25100.0, ans=0.2 2024-08-09 13:43:30,141 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-09 13:43:33,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=25200.0, ans=0.0 2024-08-09 13:43:36,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=25300.0, ans=0.1 2024-08-09 13:43:41,880 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.16 vs. limit=15.0 2024-08-09 13:43:48,731 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-09 13:43:53,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=25400.0, ans=0.1 2024-08-09 13:43:58,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=25400.0, ans=0.125 2024-08-09 13:44:08,261 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2550, loss[loss=0.1439, beats_loss=0.01376, ecapa_loss=0.0006593, whisper_loss=0.1236, over 17728.00 frames. ], tot_loss[loss=0.134, beats_loss=0.01396, ecapa_loss=0.0006639, whisper_loss=0.1134, over 3842538.47 frames. ], batch size: 72, lr: 4.38e-02, grad_scale: 64.0 2024-08-09 13:44:10,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=25500.0, ans=0.125 2024-08-09 13:44:21,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=25600.0, ans=0.125 2024-08-09 13:44:25,705 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 27 from LS+wenet, 12 from Vox, 18 fro AS 2024-08-09 13:44:43,086 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.86 vs. limit=10.0 2024-08-09 13:45:15,931 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-09 13:45:18,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=25900.0, ans=0.125 2024-08-09 13:45:21,473 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.287e+01 3.019e+01 3.579e+01 4.793e+01 1.038e+02, threshold=7.158e+01, percent-clipped=5.0 2024-08-09 13:45:21,501 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2600, loss[loss=0.1221, beats_loss=0.0134, ecapa_loss=0.0007157, whisper_loss=0.1016, over 22482.00 frames. ], tot_loss[loss=0.1339, beats_loss=0.01389, ecapa_loss=0.0006584, whisper_loss=0.1134, over 3874267.95 frames. ], batch size: 93, lr: 4.37e-02, grad_scale: 64.0 2024-08-09 13:45:22,443 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.98 vs. limit=15.0 2024-08-09 13:45:35,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=26100.0, ans=0.125 2024-08-09 13:45:35,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=26100.0, ans=0.125 2024-08-09 13:45:46,557 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-09 13:45:59,651 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 13:46:34,517 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2650, loss[loss=0.1388, beats_loss=0.01616, ecapa_loss=0.0004819, whisper_loss=0.1179, over 16793.00 frames. ], tot_loss[loss=0.1335, beats_loss=0.01393, ecapa_loss=0.0006475, whisper_loss=0.1131, over 3885694.37 frames. ], batch size: 64, lr: 4.37e-02, grad_scale: 64.0 2024-08-09 13:46:49,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=26600.0, ans=0.0 2024-08-09 13:47:05,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=26700.0, ans=0.0 2024-08-09 13:47:11,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=26700.0, ans=0.125 2024-08-09 13:47:19,695 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 13:47:36,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=26900.0, ans=0.1 2024-08-09 13:47:41,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=26900.0, ans=0.0 2024-08-09 13:47:43,878 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.55 vs. limit=22.5 2024-08-09 13:47:47,398 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.863e+01 3.296e+01 3.949e+01 7.406e+01, threshold=6.593e+01, percent-clipped=2.0 2024-08-09 13:47:47,428 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2700, loss[loss=0.1423, beats_loss=0.01425, ecapa_loss=0.0005957, whisper_loss=0.1221, over 19823.00 frames. ], tot_loss[loss=0.1324, beats_loss=0.01392, ecapa_loss=0.0006454, whisper_loss=0.1121, over 3874795.56 frames. ], batch size: 78, lr: 4.36e-02, grad_scale: 64.0 2024-08-09 13:47:47,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=27000.0, ans=0.125 2024-08-09 13:47:48,181 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.93 vs. limit=15.0 2024-08-09 13:48:06,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=27100.0, ans=0.125 2024-08-09 13:48:20,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=27200.0, ans=0.0 2024-08-09 13:48:32,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=27300.0, ans=0.04949747468305833 2024-08-09 13:48:40,838 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.28 vs. limit=15.0 2024-08-09 13:48:53,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=27400.0, ans=0.125 2024-08-09 13:48:58,379 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-09 13:49:01,139 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2750, loss[loss=0.1352, beats_loss=0.01386, ecapa_loss=0.0005384, whisper_loss=0.116, over 16317.00 frames. ], tot_loss[loss=0.1329, beats_loss=0.01378, ecapa_loss=0.0006388, whisper_loss=0.1128, over 3868292.76 frames. ], batch size: 63, lr: 4.36e-02, grad_scale: 64.0 2024-08-09 13:49:19,077 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 17 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 13:49:21,298 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.28 vs. limit=12.0 2024-08-09 13:49:49,132 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-09 13:49:59,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=27800.0, ans=0.125 2024-08-09 13:49:59,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=27800.0, ans=0.5 2024-08-09 13:50:19,595 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.874e+01 3.420e+01 4.195e+01 6.815e+01, threshold=6.839e+01, percent-clipped=2.0 2024-08-09 13:50:19,618 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2800, loss[loss=0.1526, beats_loss=0.01253, ecapa_loss=0.0005812, whisper_loss=0.1343, over 23999.00 frames. ], tot_loss[loss=0.1324, beats_loss=0.01385, ecapa_loss=0.0006365, whisper_loss=0.1122, over 3859397.61 frames. ], batch size: 93, lr: 4.36e-02, grad_scale: 64.0 2024-08-09 13:50:28,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=28000.0, ans=0.004782608695652174 2024-08-09 13:50:39,750 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.03 vs. limit=22.5 2024-08-09 13:51:16,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=28300.0, ans=0.125 2024-08-09 13:51:24,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=28400.0, ans=0.2 2024-08-09 13:51:33,574 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.075e+00 2024-08-09 13:51:36,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=28500.0, ans=0.125 2024-08-09 13:51:38,499 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2850, loss[loss=0.1287, beats_loss=0.01258, ecapa_loss=0.0005846, whisper_loss=0.1102, over 22319.00 frames. ], tot_loss[loss=0.1329, beats_loss=0.01382, ecapa_loss=0.0006336, whisper_loss=0.1127, over 3873744.55 frames. ], batch size: 89, lr: 4.35e-02, grad_scale: 64.0 2024-08-09 13:51:43,501 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-09 13:52:04,420 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-09 13:52:15,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=28700.0, ans=0.125 2024-08-09 13:52:31,347 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 13:52:35,153 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=11.26 vs. limit=10.0 2024-08-09 13:52:42,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=28900.0, ans=0.125 2024-08-09 13:52:53,097 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.57 vs. limit=22.5 2024-08-09 13:53:00,760 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+01 3.002e+01 3.706e+01 4.572e+01 7.980e+01, threshold=7.411e+01, percent-clipped=5.0 2024-08-09 13:53:00,781 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2900, loss[loss=0.1637, beats_loss=0.01092, ecapa_loss=0.0006945, whisper_loss=0.1459, over 23422.00 frames. ], tot_loss[loss=0.1329, beats_loss=0.01381, ecapa_loss=0.0006373, whisper_loss=0.1128, over 3863365.84 frames. ], batch size: 93, lr: 4.35e-02, grad_scale: 64.0 2024-08-09 13:53:06,162 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 20 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-09 13:53:08,493 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.78 vs. limit=15.0 2024-08-09 13:53:09,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=29000.0, ans=0.125 2024-08-09 13:53:14,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=29000.0, ans=0.1 2024-08-09 13:53:15,092 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-09 13:53:32,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-09 13:53:51,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=29300.0, ans=0.0 2024-08-09 13:53:59,407 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 41 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-09 13:54:19,848 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 2950, loss[loss=0.1061, beats_loss=0.01667, ecapa_loss=0.000544, whisper_loss=0.08399, over 22446.00 frames. ], tot_loss[loss=0.132, beats_loss=0.01392, ecapa_loss=0.0006335, whisper_loss=0.1117, over 3892902.93 frames. ], batch size: 92, lr: 4.34e-02, grad_scale: 64.0 2024-08-09 13:54:24,954 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-09 13:54:28,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=29500.0, ans=0.0 2024-08-09 13:54:32,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=29500.0, ans=0.125 2024-08-09 13:54:43,973 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-09 13:54:54,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=29700.0, ans=0.125 2024-08-09 13:55:02,396 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 21 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-09 13:55:17,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=29800.0, ans=0.0 2024-08-09 13:55:22,932 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-09 13:55:32,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=29900.0, ans=0.2 2024-08-09 13:55:39,212 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 3.111e+01 3.701e+01 4.234e+01 7.297e+01, threshold=7.402e+01, percent-clipped=0.0 2024-08-09 13:55:39,234 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3000, loss[loss=0.134, beats_loss=0.01319, ecapa_loss=0.0006236, whisper_loss=0.1145, over 22065.00 frames. ], tot_loss[loss=0.1324, beats_loss=0.0139, ecapa_loss=0.0006272, whisper_loss=0.1122, over 3887411.82 frames. ], batch size: 88, lr: 4.34e-02, grad_scale: 64.0 2024-08-09 13:55:39,235 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-09 13:56:23,631 INFO [train_multi_KD3.py:1149] (2/4) Epoch 1, validation on ASR_libri: loss=0.3107, beats_loss=0, ecapa_loss=0.001585, whisper_loss=0.2948, over 922467.00 frames. 2024-08-09 13:56:41,595 INFO [train_multi_KD3.py:1149] (2/4) Epoch 1, validation on SV_voxceleb1: loss=0.0159, beats_loss=0, ecapa_loss=0.00159, whisper_loss=0, over 939242.00 frames. 2024-08-09 13:57:53,595 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([1.5704, 1.2942, 1.1209, 1.0583, 1.2435, 1.0138, 1.4848, 1.2254], device='cuda:2') 2024-08-09 13:58:39,741 INFO [train_multi_KD3.py:1149] (2/4) Epoch 1, validation on AT_audioset: loss=0.03327, beats_loss=0.03327, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 13:58:39,750 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-09 13:58:52,465 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-09 13:59:06,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=30100.0, ans=0.2 2024-08-09 13:59:34,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=30300.0, ans=0.004282608695652174 2024-08-09 13:59:44,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=30300.0, ans=0.125 2024-08-09 13:59:49,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=30400.0, ans=0.2 2024-08-09 13:59:53,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=30400.0, ans=0.0 2024-08-09 13:59:58,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=30400.0, ans=0.125 2024-08-09 13:59:59,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=30400.0, ans=0.0 2024-08-09 14:00:01,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=30400.0, ans=0.0 2024-08-09 14:00:04,775 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3050, loss[loss=0.1201, beats_loss=0.01652, ecapa_loss=0.0005146, whisper_loss=0.09843, over 18850.00 frames. ], tot_loss[loss=0.1316, beats_loss=0.0139, ecapa_loss=0.0006216, whisper_loss=0.1115, over 3875859.61 frames. ], batch size: 75, lr: 4.33e-02, grad_scale: 64.0 2024-08-09 14:00:10,900 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 25 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-09 14:00:21,238 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 14:00:34,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=30600.0, ans=0.125 2024-08-09 14:00:50,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=30800.0, ans=0.125 2024-08-09 14:01:18,101 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 3.101e+01 3.734e+01 4.761e+01 9.232e+01, threshold=7.468e+01, percent-clipped=3.0 2024-08-09 14:01:18,121 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3100, loss[loss=0.1232, beats_loss=0.01466, ecapa_loss=0.0006098, whisper_loss=0.1025, over 20656.00 frames. ], tot_loss[loss=0.1322, beats_loss=0.01375, ecapa_loss=0.0006172, whisper_loss=0.1123, over 3879721.99 frames. ], batch size: 83, lr: 4.33e-02, grad_scale: 64.0 2024-08-09 14:01:31,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=31100.0, ans=0.125 2024-08-09 14:01:34,337 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 17 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-09 14:01:35,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=31100.0, ans=0.125 2024-08-09 14:01:38,331 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-09 14:01:38,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=31100.0, ans=0.125 2024-08-09 14:01:41,351 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.65 vs. limit=10.0 2024-08-09 14:01:47,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=31200.0, ans=0.2 2024-08-09 14:01:49,160 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.85 vs. limit=10.0 2024-08-09 14:01:50,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=31200.0, ans=0.04949747468305833 2024-08-09 14:01:55,645 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.08 vs. limit=15.0 2024-08-09 14:02:02,651 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-09 14:02:19,513 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.15 vs. limit=6.0 2024-08-09 14:02:20,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=31400.0, ans=0.05 2024-08-09 14:02:22,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=31500.0, ans=0.2 2024-08-09 14:02:23,756 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3150, loss[loss=0.1353, beats_loss=0.01446, ecapa_loss=0.0004984, whisper_loss=0.1159, over 19032.00 frames. ], tot_loss[loss=0.1312, beats_loss=0.01381, ecapa_loss=0.0006105, whisper_loss=0.1113, over 3871256.39 frames. ], batch size: 74, lr: 4.32e-02, grad_scale: 64.0 2024-08-09 14:02:29,943 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2024-08-09 14:02:32,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=31500.0, ans=0.125 2024-08-09 14:02:32,403 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.20 vs. limit=15.0 2024-08-09 14:02:37,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=31600.0, ans=10.0 2024-08-09 14:02:56,590 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 14:02:57,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=31700.0, ans=0.1 2024-08-09 14:03:01,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=31700.0, ans=0.04949747468305833 2024-08-09 14:03:05,814 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.36 vs. limit=15.0 2024-08-09 14:03:18,801 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 14:03:23,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=31900.0, ans=0.125 2024-08-09 14:03:23,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=31900.0, ans=0.003934782608695652 2024-08-09 14:03:30,622 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 3.005e+01 3.440e+01 4.161e+01 7.835e+01, threshold=6.880e+01, percent-clipped=1.0 2024-08-09 14:03:30,647 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3200, loss[loss=0.1395, beats_loss=0.01358, ecapa_loss=0.0005538, whisper_loss=0.1204, over 22418.00 frames. ], tot_loss[loss=0.1312, beats_loss=0.01377, ecapa_loss=0.000604, whisper_loss=0.1114, over 3895400.50 frames. ], batch size: 89, lr: 4.32e-02, grad_scale: 64.0 2024-08-09 14:03:40,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=32000.0, ans=0.1 2024-08-09 14:04:26,771 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.24 vs. limit=15.0 2024-08-09 14:04:36,263 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3250, loss[loss=0.119, beats_loss=0.01332, ecapa_loss=0.0005557, whisper_loss=0.1001, over 21419.00 frames. ], tot_loss[loss=0.1315, beats_loss=0.01364, ecapa_loss=0.0005992, whisper_loss=0.1119, over 3881298.71 frames. ], batch size: 84, lr: 4.31e-02, grad_scale: 64.0 2024-08-09 14:04:41,969 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-09 14:05:22,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=32800.0, ans=0.0 2024-08-09 14:05:30,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=32900.0, ans=0.125 2024-08-09 14:05:42,299 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 3.060e+01 3.523e+01 4.253e+01 9.588e+01, threshold=7.047e+01, percent-clipped=8.0 2024-08-09 14:05:42,322 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3300, loss[loss=0.1512, beats_loss=0.0149, ecapa_loss=0.0004799, whisper_loss=0.1315, over 22693.00 frames. ], tot_loss[loss=0.1311, beats_loss=0.01372, ecapa_loss=0.0005948, whisper_loss=0.1115, over 3895246.24 frames. ], batch size: 87, lr: 4.31e-02, grad_scale: 64.0 2024-08-09 14:05:43,097 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=18.17 vs. limit=15.0 2024-08-09 14:05:56,798 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.92 vs. limit=15.0 2024-08-09 14:06:07,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=33200.0, ans=0.1 2024-08-09 14:06:08,838 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 14:06:10,081 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 14:06:10,697 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.98 vs. limit=22.5 2024-08-09 14:06:11,414 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 42 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 14:06:20,626 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-09 14:06:21,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=33300.0, ans=0.125 2024-08-09 14:06:21,269 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.60 vs. limit=15.0 2024-08-09 14:06:25,968 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-09 14:06:28,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=33300.0, ans=0.2 2024-08-09 14:06:37,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=33400.0, ans=15.0 2024-08-09 14:06:38,429 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 17 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-09 14:06:47,073 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3350, loss[loss=0.1572, beats_loss=0.009918, ecapa_loss=0.0006097, whisper_loss=0.1412, over 15672.00 frames. ], tot_loss[loss=0.1309, beats_loss=0.01373, ecapa_loss=0.0005897, whisper_loss=0.1113, over 3892956.96 frames. ], batch size: 59, lr: 4.30e-02, grad_scale: 64.0 2024-08-09 14:06:50,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=33500.0, ans=0.125 2024-08-09 14:06:51,013 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-09 14:06:52,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=33500.0, ans=0.125 2024-08-09 14:07:26,649 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-09 14:07:29,176 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-09 14:07:41,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=33900.0, ans=0.0 2024-08-09 14:07:51,777 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-09 14:07:53,333 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 3.123e+01 3.529e+01 4.678e+01 1.147e+02, threshold=7.058e+01, percent-clipped=6.0 2024-08-09 14:07:53,353 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3400, loss[loss=0.1124, beats_loss=0.01577, ecapa_loss=0.0005862, whisper_loss=0.09075, over 21506.00 frames. ], tot_loss[loss=0.1299, beats_loss=0.01387, ecapa_loss=0.0005804, whisper_loss=0.1103, over 3870198.84 frames. ], batch size: 90, lr: 4.29e-02, grad_scale: 64.0 2024-08-09 14:08:03,237 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-09 14:08:07,140 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.28 vs. limit=22.5 2024-08-09 14:08:28,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=34200.0, ans=0.125 2024-08-09 14:08:32,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=34300.0, ans=0.125 2024-08-09 14:08:33,929 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-09 14:08:35,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=34300.0, ans=0.0 2024-08-09 14:08:38,869 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-09 14:08:46,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=34400.0, ans=0.0 2024-08-09 14:08:49,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=34400.0, ans=0.0 2024-08-09 14:08:50,335 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-09 14:08:53,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=34400.0, ans=0.125 2024-08-09 14:08:56,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=34400.0, ans=15.0 2024-08-09 14:08:57,865 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3450, loss[loss=0.121, beats_loss=0.0161, ecapa_loss=0.0005712, whisper_loss=0.09915, over 16538.00 frames. ], tot_loss[loss=0.1294, beats_loss=0.01392, ecapa_loss=0.0005774, whisper_loss=0.1097, over 3859240.04 frames. ], batch size: 68, lr: 4.29e-02, grad_scale: 64.0 2024-08-09 14:09:12,093 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-09 14:09:19,403 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.91 vs. limit=22.5 2024-08-09 14:09:21,362 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 10 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-09 14:09:42,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=34800.0, ans=15.0 2024-08-09 14:09:44,066 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2024-08-09 14:09:58,343 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.04 vs. limit=10.0 2024-08-09 14:10:02,683 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.173e+01 2.921e+01 3.468e+01 4.313e+01 8.519e+01, threshold=6.936e+01, percent-clipped=1.0 2024-08-09 14:10:02,703 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3500, loss[loss=0.1477, beats_loss=0.01384, ecapa_loss=0.0004662, whisper_loss=0.1292, over 14922.00 frames. ], tot_loss[loss=0.1288, beats_loss=0.01395, ecapa_loss=0.0005764, whisper_loss=0.1091, over 3842441.81 frames. ], batch size: 55, lr: 4.28e-02, grad_scale: 64.0 2024-08-09 14:10:05,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=35000.0, ans=0.125 2024-08-09 14:10:18,765 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-09 14:10:19,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=35100.0, ans=0.1 2024-08-09 14:10:24,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=35100.0, ans=0.125 2024-08-09 14:10:26,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=35100.0, ans=0.0 2024-08-09 14:10:37,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=35200.0, ans=0.125 2024-08-09 14:10:41,705 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.71 vs. limit=15.0 2024-08-09 14:10:48,669 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-09 14:10:52,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=35300.0, ans=0.125 2024-08-09 14:10:52,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=35300.0, ans=0.1 2024-08-09 14:10:53,213 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.47 vs. limit=22.5 2024-08-09 14:10:56,759 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-09 14:11:06,356 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.45 vs. limit=22.5 2024-08-09 14:11:07,954 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3550, loss[loss=0.1319, beats_loss=0.01483, ecapa_loss=0.0004901, whisper_loss=0.1121, over 18653.00 frames. ], tot_loss[loss=0.1282, beats_loss=0.01394, ecapa_loss=0.0005739, whisper_loss=0.1085, over 3844917.19 frames. ], batch size: 72, lr: 4.28e-02, grad_scale: 64.0 2024-08-09 14:11:12,309 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.20 vs. limit=22.5 2024-08-09 14:11:12,440 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.61 vs. limit=15.0 2024-08-09 14:11:25,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=35600.0, ans=0.2 2024-08-09 14:11:29,604 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 14:11:29,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=35600.0, ans=0.125 2024-08-09 14:11:32,871 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.31 vs. limit=15.0 2024-08-09 14:11:34,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=35700.0, ans=15.0 2024-08-09 14:11:51,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=35800.0, ans=0.04949747468305833 2024-08-09 14:11:55,280 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-09 14:12:01,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=35900.0, ans=10.0 2024-08-09 14:12:05,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=35900.0, ans=0.1 2024-08-09 14:12:10,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=35900.0, ans=0.07 2024-08-09 14:12:13,378 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 3.146e+01 3.821e+01 4.721e+01 1.022e+02, threshold=7.642e+01, percent-clipped=5.0 2024-08-09 14:12:13,398 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3600, loss[loss=0.1253, beats_loss=0.01526, ecapa_loss=0.0005335, whisper_loss=0.1048, over 13820.00 frames. ], tot_loss[loss=0.1289, beats_loss=0.01383, ecapa_loss=0.0005722, whisper_loss=0.1093, over 3836502.10 frames. ], batch size: 53, lr: 4.27e-02, grad_scale: 64.0 2024-08-09 14:12:24,279 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-09 14:12:36,467 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-09 14:12:37,167 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.19 vs. limit=15.0 2024-08-09 14:12:37,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=36100.0, ans=0.2 2024-08-09 14:12:37,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=36100.0, ans=0.125 2024-08-09 14:12:43,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=36200.0, ans=0.125 2024-08-09 14:13:00,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=36300.0, ans=0.0 2024-08-09 14:13:01,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=36300.0, ans=0.125 2024-08-09 14:13:03,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=36300.0, ans=0.0029782608695652175 2024-08-09 14:13:19,778 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3650, loss[loss=0.1388, beats_loss=0.01453, ecapa_loss=0.0005586, whisper_loss=0.1187, over 18410.00 frames. ], tot_loss[loss=0.1287, beats_loss=0.01376, ecapa_loss=0.0005732, whisper_loss=0.1092, over 3833019.61 frames. ], batch size: 73, lr: 4.27e-02, grad_scale: 64.0 2024-08-09 14:13:46,106 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 14:13:49,671 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 14:13:59,595 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.49 vs. limit=12.0 2024-08-09 14:14:00,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=36800.0, ans=0.125 2024-08-09 14:14:00,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=36800.0, ans=0.125 2024-08-09 14:14:24,860 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.925e+01 3.373e+01 4.021e+01 6.000e+01, threshold=6.747e+01, percent-clipped=0.0 2024-08-09 14:14:24,879 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3700, loss[loss=0.1224, beats_loss=0.01413, ecapa_loss=0.0004813, whisper_loss=0.1035, over 17823.00 frames. ], tot_loss[loss=0.1294, beats_loss=0.01361, ecapa_loss=0.0005712, whisper_loss=0.1101, over 3809069.26 frames. ], batch size: 71, lr: 4.26e-02, grad_scale: 64.0 2024-08-09 14:14:47,758 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 14:14:53,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37200.0, ans=0.1 2024-08-09 14:14:53,938 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.82 vs. limit=6.0 2024-08-09 14:15:03,655 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 14:15:05,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=37300.0, ans=0.04949747468305833 2024-08-09 14:15:06,853 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=22.5 2024-08-09 14:15:16,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=37400.0, ans=0.125 2024-08-09 14:15:18,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=37400.0, ans=10.0 2024-08-09 14:15:18,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=37400.0, ans=0.0027391304347826086 2024-08-09 14:15:30,213 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3750, loss[loss=0.1158, beats_loss=0.01681, ecapa_loss=0.0005868, whisper_loss=0.0931, over 22469.00 frames. ], tot_loss[loss=0.129, beats_loss=0.01374, ecapa_loss=0.0005675, whisper_loss=0.1096, over 3832134.62 frames. ], batch size: 93, lr: 4.26e-02, grad_scale: 64.0 2024-08-09 14:15:38,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=37500.0, ans=0.125 2024-08-09 14:15:38,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=37500.0, ans=0.125 2024-08-09 14:15:55,092 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 14:16:12,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37800.0, ans=0.1 2024-08-09 14:16:13,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=37800.0, ans=0.015 2024-08-09 14:16:16,828 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 14:16:26,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=37900.0, ans=0.002630434782608696 2024-08-09 14:16:27,843 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 14:16:30,944 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-09 14:16:33,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37900.0, ans=0.1 2024-08-09 14:16:36,532 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 3.197e+01 3.801e+01 4.581e+01 9.571e+01, threshold=7.603e+01, percent-clipped=5.0 2024-08-09 14:16:36,555 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3800, loss[loss=0.1348, beats_loss=0.01202, ecapa_loss=0.0006451, whisper_loss=0.1164, over 15687.00 frames. ], tot_loss[loss=0.129, beats_loss=0.01376, ecapa_loss=0.0005642, whisper_loss=0.1096, over 3861574.90 frames. ], batch size: 66, lr: 4.25e-02, grad_scale: 64.0 2024-08-09 14:17:04,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=38200.0, ans=0.1 2024-08-09 14:17:06,791 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-09 14:17:09,315 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-09 14:17:17,692 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-09 14:17:20,248 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 12 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 14:17:24,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=38300.0, ans=0.2 2024-08-09 14:17:33,851 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.72 vs. limit=6.0 2024-08-09 14:17:37,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=38400.0, ans=0.125 2024-08-09 14:17:41,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=15.0 2024-08-09 14:17:41,957 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3850, loss[loss=0.1267, beats_loss=0.01419, ecapa_loss=0.0005326, whisper_loss=0.1071, over 16189.00 frames. ], tot_loss[loss=0.1285, beats_loss=0.01383, ecapa_loss=0.0005583, whisper_loss=0.1091, over 3858500.95 frames. ], batch size: 65, lr: 4.24e-02, grad_scale: 64.0 2024-08-09 14:17:54,056 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-09 14:17:55,651 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-09 14:17:56,417 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=28.57 vs. limit=22.5 2024-08-09 14:18:03,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2024-08-09 14:18:31,018 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-09 14:18:39,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=38900.0, ans=0.07 2024-08-09 14:18:40,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=38900.0, ans=0.1 2024-08-09 14:18:42,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=38900.0, ans=0.2 2024-08-09 14:18:44,144 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 35 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-09 14:18:44,798 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=12.0 2024-08-09 14:18:49,359 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+01 3.021e+01 3.699e+01 4.570e+01 7.428e+01, threshold=7.398e+01, percent-clipped=0.0 2024-08-09 14:18:49,388 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3900, loss[loss=0.1499, beats_loss=0.009357, ecapa_loss=0.0006191, whisper_loss=0.1344, over 18785.00 frames. ], tot_loss[loss=0.1294, beats_loss=0.01375, ecapa_loss=0.0005556, whisper_loss=0.1101, over 3915069.04 frames. ], batch size: 73, lr: 4.24e-02, grad_scale: 64.0 2024-08-09 14:18:57,243 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-09 14:19:00,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=39000.0, ans=0.125 2024-08-09 14:19:02,634 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 14:19:08,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=39100.0, ans=0.125 2024-08-09 14:19:11,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=39100.0, ans=0.0 2024-08-09 14:19:24,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=39200.0, ans=0.125 2024-08-09 14:19:28,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=39300.0, ans=0.07 2024-08-09 14:19:30,865 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2024-08-09 14:19:32,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=39300.0, ans=0.125 2024-08-09 14:19:34,492 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 28 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 14:19:38,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=39300.0, ans=0.1 2024-08-09 14:19:53,071 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 14:19:53,875 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 3950, loss[loss=0.1195, beats_loss=0.01452, ecapa_loss=0.0005363, whisper_loss=0.09959, over 21747.00 frames. ], tot_loss[loss=0.1287, beats_loss=0.01369, ecapa_loss=0.0005532, whisper_loss=0.1095, over 3884350.86 frames. ], batch size: 88, lr: 4.23e-02, grad_scale: 64.0 2024-08-09 14:20:12,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=39600.0, ans=0.0022608695652173915 2024-08-09 14:20:15,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=39600.0, ans=0.5 2024-08-09 14:20:21,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=39600.0, ans=0.125 2024-08-09 14:20:26,927 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 14:20:28,639 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=12.0 2024-08-09 14:20:31,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=39700.0, ans=0.125 2024-08-09 14:20:42,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=39800.0, ans=0.0 2024-08-09 14:20:45,149 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-09 14:20:53,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=39900.0, ans=0.025 2024-08-09 14:21:03,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.10 vs. limit=22.5 2024-08-09 14:21:12,619 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+01 3.104e+01 3.769e+01 4.628e+01 7.300e+01, threshold=7.538e+01, percent-clipped=0.0 2024-08-09 14:21:12,645 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4000, loss[loss=0.1309, beats_loss=0.01526, ecapa_loss=0.0005292, whisper_loss=0.1103, over 20996.00 frames. ], tot_loss[loss=0.1295, beats_loss=0.0136, ecapa_loss=0.0005503, whisper_loss=0.1104, over 3890353.35 frames. ], batch size: 85, lr: 4.23e-02, grad_scale: 128.0 2024-08-09 14:21:15,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=40000.0, ans=0.1 2024-08-09 14:21:31,089 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=16.92 vs. limit=15.0 2024-08-09 14:21:38,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=40200.0, ans=0.0 2024-08-09 14:22:21,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=40500.0, ans=0.04949747468305833 2024-08-09 14:22:23,204 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4050, loss[loss=0.1406, beats_loss=0.01024, ecapa_loss=0.0005857, whisper_loss=0.1245, over 21405.00 frames. ], tot_loss[loss=0.129, beats_loss=0.01361, ecapa_loss=0.0005492, whisper_loss=0.1099, over 3912430.74 frames. ], batch size: 82, lr: 4.22e-02, grad_scale: 128.0 2024-08-09 14:22:23,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=40500.0, ans=0.125 2024-08-09 14:22:26,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=40500.0, ans=0.125 2024-08-09 14:22:30,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=40500.0, ans=0.125 2024-08-09 14:22:35,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=40600.0, ans=0.125 2024-08-09 14:22:41,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=40600.0, ans=0.0 2024-08-09 14:22:58,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=40700.0, ans=0.2 2024-08-09 14:23:04,049 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.75 vs. limit=15.0 2024-08-09 14:23:04,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=40800.0, ans=0.1 2024-08-09 14:23:15,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=40900.0, ans=0.0 2024-08-09 14:23:20,587 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-09 14:23:28,260 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.466e+01 2.975e+01 3.511e+01 4.257e+01 6.601e+01, threshold=7.021e+01, percent-clipped=0.0 2024-08-09 14:23:28,284 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4100, loss[loss=0.1327, beats_loss=0.01448, ecapa_loss=0.0003961, whisper_loss=0.1142, over 22168.00 frames. ], tot_loss[loss=0.1289, beats_loss=0.01359, ecapa_loss=0.0005466, whisper_loss=0.1099, over 3889062.12 frames. ], batch size: 83, lr: 4.22e-02, grad_scale: 128.0 2024-08-09 14:23:31,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=41000.0, ans=0.95 2024-08-09 14:23:34,178 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2024-08-09 14:23:36,852 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2024-08-09 14:23:48,360 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-09 14:23:57,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=41200.0, ans=0.125 2024-08-09 14:24:03,384 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 14:24:04,573 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-09 14:24:07,410 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-09 14:24:07,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=41300.0, ans=0.0 2024-08-09 14:24:07,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=41300.0, ans=0.5 2024-08-09 14:24:20,475 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 14:24:28,748 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.53 vs. limit=22.5 2024-08-09 14:24:33,595 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4150, loss[loss=0.1239, beats_loss=0.01288, ecapa_loss=0.0005058, whisper_loss=0.1059, over 19373.00 frames. ], tot_loss[loss=0.1297, beats_loss=0.01346, ecapa_loss=0.000545, whisper_loss=0.1108, over 3901522.76 frames. ], batch size: 77, lr: 4.21e-02, grad_scale: 128.0 2024-08-09 14:24:40,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=41500.0, ans=0.125 2024-08-09 14:24:51,242 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.64 vs. limit=6.0 2024-08-09 14:25:02,442 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.01 vs. limit=22.5 2024-08-09 14:25:16,414 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 14:25:16,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=41800.0, ans=0.2 2024-08-09 14:25:24,764 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 14:25:29,995 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-09 14:25:37,463 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.939e+01 3.388e+01 4.308e+01 6.716e+01, threshold=6.777e+01, percent-clipped=0.0 2024-08-09 14:25:37,488 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4200, loss[loss=0.119, beats_loss=0.01481, ecapa_loss=0.0005595, whisper_loss=0.09864, over 21918.00 frames. ], tot_loss[loss=0.1295, beats_loss=0.0134, ecapa_loss=0.0005434, whisper_loss=0.1107, over 3902901.39 frames. ], batch size: 89, lr: 4.20e-02, grad_scale: 128.0 2024-08-09 14:25:44,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=42000.0, ans=0.125 2024-08-09 14:25:53,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=42100.0, ans=0.0 2024-08-09 14:26:10,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=42200.0, ans=0.125 2024-08-09 14:26:14,770 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.05 vs. limit=22.5 2024-08-09 14:26:19,322 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-09 14:26:41,810 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4250, loss[loss=0.1271, beats_loss=0.01393, ecapa_loss=0.0005026, whisper_loss=0.1082, over 15394.00 frames. ], tot_loss[loss=0.1287, beats_loss=0.01346, ecapa_loss=0.0005407, whisper_loss=0.1098, over 3896980.52 frames. ], batch size: 62, lr: 4.20e-02, grad_scale: 128.0 2024-08-09 14:26:55,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=42600.0, ans=0.07 2024-08-09 14:27:01,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=42600.0, ans=0.001608695652173914 2024-08-09 14:27:01,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=42600.0, ans=0.0 2024-08-09 14:27:06,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=42600.0, ans=0.125 2024-08-09 14:27:14,463 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-09 14:27:18,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=42700.0, ans=0.125 2024-08-09 14:27:27,021 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.09 vs. limit=22.5 2024-08-09 14:27:32,394 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-09 14:27:35,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=42900.0, ans=0.05 2024-08-09 14:27:37,292 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-09 14:27:46,641 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+01 3.006e+01 3.697e+01 4.408e+01 8.760e+01, threshold=7.393e+01, percent-clipped=1.0 2024-08-09 14:27:46,662 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4300, loss[loss=0.1399, beats_loss=0.0117, ecapa_loss=0.0006271, whisper_loss=0.122, over 22130.00 frames. ], tot_loss[loss=0.1285, beats_loss=0.01336, ecapa_loss=0.0005361, whisper_loss=0.1097, over 3876922.50 frames. ], batch size: 93, lr: 4.19e-02, grad_scale: 128.0 2024-08-09 14:27:51,982 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-09 14:28:19,275 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-09 14:28:30,476 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-08-09 14:28:32,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=43300.0, ans=0.2 2024-08-09 14:28:40,531 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.30 vs. limit=22.5 2024-08-09 14:28:43,657 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-09 14:28:50,052 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.88 vs. limit=22.5 2024-08-09 14:28:51,621 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4350, loss[loss=0.1474, beats_loss=0.01291, ecapa_loss=0.0005697, whisper_loss=0.1288, over 21152.00 frames. ], tot_loss[loss=0.1287, beats_loss=0.01326, ecapa_loss=0.0005345, whisper_loss=0.1101, over 3872183.88 frames. ], batch size: 88, lr: 4.19e-02, grad_scale: 128.0 2024-08-09 14:28:52,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=43500.0, ans=0.1 2024-08-09 14:28:54,311 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-09 14:28:57,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=43500.0, ans=0.0 2024-08-09 14:29:00,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=43500.0, ans=0.0 2024-08-09 14:29:15,812 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2024-08-09 14:29:18,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=43700.0, ans=0.125 2024-08-09 14:29:24,107 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 8 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-09 14:29:30,328 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=12.0 2024-08-09 14:29:34,485 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 32 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-09 14:29:39,491 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.34 vs. limit=10.0 2024-08-09 14:29:49,710 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 14:29:50,967 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-09 14:29:57,786 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.950e+01 3.412e+01 4.173e+01 7.476e+01, threshold=6.823e+01, percent-clipped=1.0 2024-08-09 14:29:57,812 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4400, loss[loss=0.1094, beats_loss=0.0151, ecapa_loss=0.0005013, whisper_loss=0.08928, over 21314.00 frames. ], tot_loss[loss=0.1282, beats_loss=0.01336, ecapa_loss=0.0005325, whisper_loss=0.1095, over 3854714.46 frames. ], batch size: 87, lr: 4.18e-02, grad_scale: 128.0 2024-08-09 14:30:06,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=44000.0, ans=10.0 2024-08-09 14:30:10,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=44000.0, ans=0.125 2024-08-09 14:30:47,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=44300.0, ans=10.0 2024-08-09 14:30:49,690 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-09 14:30:49,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=44300.0, ans=0.05 2024-08-09 14:31:09,111 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.32 vs. limit=15.0 2024-08-09 14:31:22,081 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4450, loss[loss=0.1317, beats_loss=0.01387, ecapa_loss=0.0004494, whisper_loss=0.1134, over 22780.00 frames. ], tot_loss[loss=0.1279, beats_loss=0.01346, ecapa_loss=0.0005308, whisper_loss=0.1091, over 3893893.35 frames. ], batch size: 84, lr: 4.17e-02, grad_scale: 128.0 2024-08-09 14:31:24,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=44500.0, ans=10.0 2024-08-09 14:31:31,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=44500.0, ans=0.0 2024-08-09 14:31:40,552 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.69 vs. limit=22.5 2024-08-09 14:31:58,623 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.136e-02 2024-08-09 14:32:48,656 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.568e+01 3.039e+01 3.733e+01 4.656e+01 8.279e+01, threshold=7.465e+01, percent-clipped=2.0 2024-08-09 14:32:48,682 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4500, loss[loss=0.1461, beats_loss=0.008743, ecapa_loss=0.0006264, whisper_loss=0.1311, over 15217.00 frames. ], tot_loss[loss=0.1268, beats_loss=0.01345, ecapa_loss=0.0005286, whisper_loss=0.1081, over 3897885.10 frames. ], batch size: 59, lr: 4.17e-02, grad_scale: 128.0 2024-08-09 14:32:50,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=45000.0, ans=0.125 2024-08-09 14:32:58,405 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-09 14:33:08,037 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 14:33:11,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=45100.0, ans=0.125 2024-08-09 14:33:13,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=45100.0, ans=0.125 2024-08-09 14:33:14,964 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-09 14:33:22,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=45200.0, ans=0.0 2024-08-09 14:33:30,395 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 32 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 14:33:31,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=45200.0, ans=0.125 2024-08-09 14:33:36,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=45300.0, ans=0.0010217391304347834 2024-08-09 14:34:03,350 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 14 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-09 14:34:11,143 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4550, loss[loss=0.1321, beats_loss=0.007461, ecapa_loss=0.0006493, whisper_loss=0.1181, over 17573.00 frames. ], tot_loss[loss=0.1263, beats_loss=0.01342, ecapa_loss=0.0005291, whisper_loss=0.1076, over 3896018.16 frames. ], batch size: 68, lr: 4.16e-02, grad_scale: 128.0 2024-08-09 14:34:12,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=45500.0, ans=0.125 2024-08-09 14:34:18,180 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-09 14:34:20,550 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.57 vs. limit=22.5 2024-08-09 14:34:33,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=45600.0, ans=0.5 2024-08-09 14:34:35,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=45600.0, ans=0.0 2024-08-09 14:34:42,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=45700.0, ans=0.125 2024-08-09 14:34:46,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=45700.0, ans=0.1 2024-08-09 14:34:47,296 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 14:34:58,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=45800.0, ans=10.0 2024-08-09 14:35:05,999 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-09 14:35:26,521 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 14:35:32,578 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.954e+01 3.369e+01 4.036e+01 7.171e+01, threshold=6.737e+01, percent-clipped=0.0 2024-08-09 14:35:32,604 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4600, loss[loss=0.108, beats_loss=0.01655, ecapa_loss=0.0004641, whisper_loss=0.08683, over 20631.00 frames. ], tot_loss[loss=0.126, beats_loss=0.01355, ecapa_loss=0.000522, whisper_loss=0.1073, over 3897556.59 frames. ], batch size: 86, lr: 4.15e-02, grad_scale: 128.0 2024-08-09 14:35:38,003 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-09 14:35:39,965 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.35 vs. limit=15.0 2024-08-09 14:35:53,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=46100.0, ans=0.125 2024-08-09 14:35:57,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=46100.0, ans=0.125 2024-08-09 14:35:58,576 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-09 14:36:22,111 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 14:36:28,776 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 14:36:36,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=46400.0, ans=0.0 2024-08-09 14:36:44,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=46400.0, ans=0.125 2024-08-09 14:36:45,296 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2024-08-09 14:36:46,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=46400.0, ans=0.0 2024-08-09 14:36:54,088 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4650, loss[loss=0.1279, beats_loss=0.01515, ecapa_loss=0.0005353, whisper_loss=0.1074, over 21287.00 frames. ], tot_loss[loss=0.1254, beats_loss=0.01367, ecapa_loss=0.0005209, whisper_loss=0.1065, over 3912947.74 frames. ], batch size: 90, lr: 4.15e-02, grad_scale: 128.0 2024-08-09 14:36:54,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=46500.0, ans=0.125 2024-08-09 14:36:54,865 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.45 vs. limit=6.0 2024-08-09 14:36:58,493 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-09 14:37:00,196 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 14:37:16,168 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-09 14:37:38,158 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-09 14:37:57,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=46800.0, ans=0.1 2024-08-09 14:38:08,462 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 14:38:16,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=47000.0, ans=0.125 2024-08-09 14:38:17,878 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 3.046e+01 3.609e+01 4.617e+01 7.306e+01, threshold=7.217e+01, percent-clipped=2.0 2024-08-09 14:38:17,900 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4700, loss[loss=0.1191, beats_loss=0.01362, ecapa_loss=0.0005997, whisper_loss=0.09947, over 19046.00 frames. ], tot_loss[loss=0.1258, beats_loss=0.01364, ecapa_loss=0.0005179, whisper_loss=0.107, over 3899334.59 frames. ], batch size: 80, lr: 4.14e-02, grad_scale: 128.0 2024-08-09 14:38:25,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=47000.0, ans=0.07 2024-08-09 14:39:02,680 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 14:39:08,466 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.52 vs. limit=22.5 2024-08-09 14:39:10,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=47300.0, ans=0.125 2024-08-09 14:39:14,537 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-09 14:39:22,270 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=33.71 vs. limit=22.5 2024-08-09 14:39:34,251 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-09 14:39:43,292 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4750, loss[loss=0.1079, beats_loss=0.01516, ecapa_loss=0.0004566, whisper_loss=0.0882, over 18266.00 frames. ], tot_loss[loss=0.1255, beats_loss=0.01381, ecapa_loss=0.0005129, whisper_loss=0.1065, over 3938455.62 frames. ], batch size: 73, lr: 4.14e-02, grad_scale: 128.0 2024-08-09 14:39:50,657 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.48 vs. limit=15.0 2024-08-09 14:39:53,818 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-09 14:39:54,427 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=35.45 vs. limit=22.5 2024-08-09 14:40:37,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=47800.0, ans=0.2 2024-08-09 14:40:59,968 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-09 14:41:05,454 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.532e+01 3.158e+01 3.572e+01 4.344e+01 1.074e+02, threshold=7.144e+01, percent-clipped=1.0 2024-08-09 14:41:05,482 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4800, loss[loss=0.09686, beats_loss=0.01521, ecapa_loss=0.0004432, whisper_loss=0.07722, over 18289.00 frames. ], tot_loss[loss=0.125, beats_loss=0.01379, ecapa_loss=0.0005133, whisper_loss=0.1061, over 3915205.26 frames. ], batch size: 75, lr: 4.13e-02, grad_scale: 128.0 2024-08-09 14:41:19,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.64 vs. limit=15.0 2024-08-09 14:41:24,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=48100.0, ans=0.035 2024-08-09 14:41:43,362 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.25 vs. limit=22.5 2024-08-09 14:41:56,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=48300.0, ans=0.125 2024-08-09 14:42:02,514 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.71 vs. limit=15.0 2024-08-09 14:42:04,637 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 14:42:06,452 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2024-08-09 14:42:08,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=48300.0, ans=0.0 2024-08-09 14:42:14,472 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-09 14:42:26,799 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-09 14:42:31,832 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4850, loss[loss=0.1251, beats_loss=0.01209, ecapa_loss=0.0005707, whisper_loss=0.1074, over 21347.00 frames. ], tot_loss[loss=0.1252, beats_loss=0.01379, ecapa_loss=0.0005119, whisper_loss=0.1063, over 3911221.18 frames. ], batch size: 88, lr: 4.12e-02, grad_scale: 128.0 2024-08-09 14:42:46,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=48600.0, ans=0.5 2024-08-09 14:42:47,077 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=11.63 vs. limit=10.0 2024-08-09 14:42:50,781 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-09 14:43:08,091 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=19.43 vs. limit=15.0 2024-08-09 14:43:08,208 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.96 vs. limit=15.0 2024-08-09 14:43:26,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=48800.0, ans=0.125 2024-08-09 14:43:31,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=48800.0, ans=0.00026086956521739237 2024-08-09 14:43:39,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=48800.0, ans=15.0 2024-08-09 14:43:46,286 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 30 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-09 14:43:57,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=48900.0, ans=0.125 2024-08-09 14:44:00,241 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 3.275e+01 3.682e+01 4.305e+01 7.376e+01, threshold=7.365e+01, percent-clipped=1.0 2024-08-09 14:44:00,261 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4900, loss[loss=0.1346, beats_loss=0.01288, ecapa_loss=0.0005089, whisper_loss=0.1167, over 23115.00 frames. ], tot_loss[loss=0.1255, beats_loss=0.01371, ecapa_loss=0.0005126, whisper_loss=0.1066, over 3879035.23 frames. ], batch size: 92, lr: 4.12e-02, grad_scale: 128.0 2024-08-09 14:44:36,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=49200.0, ans=0.02 2024-08-09 14:44:39,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=49200.0, ans=0.0 2024-08-09 14:45:05,062 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.73 vs. limit=22.5 2024-08-09 14:45:15,401 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-09 14:45:15,942 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=12.0 2024-08-09 14:45:26,185 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 4950, loss[loss=0.1409, beats_loss=0.01094, ecapa_loss=0.000551, whisper_loss=0.1245, over 22850.00 frames. ], tot_loss[loss=0.1256, beats_loss=0.0136, ecapa_loss=0.0005105, whisper_loss=0.1069, over 3873097.68 frames. ], batch size: 89, lr: 4.11e-02, grad_scale: 128.0 2024-08-09 14:45:37,969 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 14:45:50,938 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 14:46:08,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=49700.0, ans=0.2 2024-08-09 14:46:22,449 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.26 vs. limit=22.5 2024-08-09 14:46:28,493 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 14:46:35,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=49900.0, ans=2.1739130434782553e-05 2024-08-09 14:46:46,914 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-09 14:46:52,301 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.233e+01 3.042e+01 3.499e+01 4.372e+01 7.194e+01, threshold=6.999e+01, percent-clipped=0.0 2024-08-09 14:46:52,327 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5000, loss[loss=0.1389, beats_loss=0.01015, ecapa_loss=0.0005659, whisper_loss=0.1231, over 20611.00 frames. ], tot_loss[loss=0.1264, beats_loss=0.01349, ecapa_loss=0.0005089, whisper_loss=0.1078, over 3861802.68 frames. ], batch size: 80, lr: 4.10e-02, grad_scale: 128.0 2024-08-09 14:46:53,166 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2024-08-09 14:47:10,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=50100.0, ans=10.0 2024-08-09 14:47:16,351 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 14:47:23,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=50200.0, ans=0.09899494936611666 2024-08-09 14:47:54,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=50400.0, ans=0.125 2024-08-09 14:48:04,937 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-09 14:48:05,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=50400.0, ans=0.125 2024-08-09 14:48:06,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=50400.0, ans=0.0 2024-08-09 14:48:08,714 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5050, loss[loss=0.1299, beats_loss=0.01465, ecapa_loss=0.0004639, whisper_loss=0.1106, over 22727.00 frames. ], tot_loss[loss=0.1258, beats_loss=0.0136, ecapa_loss=0.0005031, whisper_loss=0.1072, over 3864340.22 frames. ], batch size: 89, lr: 4.10e-02, grad_scale: 128.0 2024-08-09 14:48:14,466 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 14:48:33,132 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 14:48:34,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=50700.0, ans=0.125 2024-08-09 14:48:44,712 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-09 14:48:46,025 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-09 14:48:51,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=50800.0, ans=0.2 2024-08-09 14:48:59,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=50800.0, ans=0.125 2024-08-09 14:49:09,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=50900.0, ans=0.1 2024-08-09 14:49:15,338 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 3.052e+01 3.532e+01 4.388e+01 7.103e+01, threshold=7.064e+01, percent-clipped=2.0 2024-08-09 14:49:15,358 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5100, loss[loss=0.111, beats_loss=0.01482, ecapa_loss=0.0004489, whisper_loss=0.09173, over 18956.00 frames. ], tot_loss[loss=0.126, beats_loss=0.01371, ecapa_loss=0.0005013, whisper_loss=0.1073, over 3911017.13 frames. ], batch size: 76, lr: 4.09e-02, grad_scale: 128.0 2024-08-09 14:49:16,730 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 14:49:17,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=51000.0, ans=0.125 2024-08-09 14:49:23,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.93 vs. limit=15.0 2024-08-09 14:49:26,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=51000.0, ans=0.125 2024-08-09 14:49:29,645 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-09 14:49:49,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=51200.0, ans=0.1 2024-08-09 14:49:53,308 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 14:50:01,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=51300.0, ans=0.125 2024-08-09 14:50:05,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=51300.0, ans=0.125 2024-08-09 14:50:12,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=51400.0, ans=0.025 2024-08-09 14:50:14,310 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.30 vs. limit=15.0 2024-08-09 14:50:20,139 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5150, loss[loss=0.1329, beats_loss=0.01352, ecapa_loss=0.0003684, whisper_loss=0.1157, over 24005.00 frames. ], tot_loss[loss=0.1256, beats_loss=0.01364, ecapa_loss=0.0004992, whisper_loss=0.107, over 3903178.09 frames. ], batch size: 88, lr: 4.09e-02, grad_scale: 128.0 2024-08-09 14:50:42,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=51600.0, ans=0.125 2024-08-09 14:50:44,991 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2024-08-09 14:50:48,330 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-09 14:50:56,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=51700.0, ans=0.0 2024-08-09 14:51:00,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=51800.0, ans=0.0 2024-08-09 14:51:13,914 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=27.21 vs. limit=22.5 2024-08-09 14:51:18,532 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 14:51:25,024 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 2.954e+01 3.465e+01 4.225e+01 6.973e+01, threshold=6.929e+01, percent-clipped=0.0 2024-08-09 14:51:25,046 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5200, loss[loss=0.1569, beats_loss=0.01137, ecapa_loss=0.0005338, whisper_loss=0.1402, over 18921.00 frames. ], tot_loss[loss=0.1263, beats_loss=0.01356, ecapa_loss=0.0004963, whisper_loss=0.1078, over 3905425.23 frames. ], batch size: 73, lr: 4.08e-02, grad_scale: 128.0 2024-08-09 14:51:27,093 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.83 vs. limit=22.5 2024-08-09 14:51:27,698 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 29 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-09 14:51:37,933 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 14:51:48,297 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 14 from Vox, 53 fro AS 2024-08-09 14:52:01,573 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 14:52:08,940 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-09 14:52:14,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=52300.0, ans=0.0 2024-08-09 14:52:24,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=52400.0, ans=0.125 2024-08-09 14:52:28,885 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5250, loss[loss=0.1449, beats_loss=0.01286, ecapa_loss=0.0004257, whisper_loss=0.1278, over 19878.00 frames. ], tot_loss[loss=0.126, beats_loss=0.01351, ecapa_loss=0.0004949, whisper_loss=0.1075, over 3901189.61 frames. ], batch size: 74, lr: 4.07e-02, grad_scale: 128.0 2024-08-09 14:52:38,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=52500.0, ans=0.0 2024-08-09 14:52:42,120 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 14:53:11,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=52800.0, ans=0.2 2024-08-09 14:53:18,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=52800.0, ans=0.125 2024-08-09 14:53:19,564 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 14:53:24,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=52900.0, ans=0.1 2024-08-09 14:53:27,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=52900.0, ans=0.0 2024-08-09 14:53:30,242 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2024-08-09 14:53:33,224 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.986e+01 3.430e+01 3.984e+01 5.910e+01, threshold=6.859e+01, percent-clipped=0.0 2024-08-09 14:53:33,247 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5300, loss[loss=0.109, beats_loss=0.01375, ecapa_loss=0.0005083, whisper_loss=0.0902, over 17971.00 frames. ], tot_loss[loss=0.126, beats_loss=0.01349, ecapa_loss=0.0004931, whisper_loss=0.1075, over 3912690.26 frames. ], batch size: 71, lr: 4.07e-02, grad_scale: 128.0 2024-08-09 14:53:44,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=53000.0, ans=0.5 2024-08-09 14:53:53,663 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-09 14:53:55,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=53100.0, ans=0.125 2024-08-09 14:53:56,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=53100.0, ans=0.125 2024-08-09 14:54:02,541 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 14:54:05,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=53200.0, ans=0.1 2024-08-09 14:54:15,742 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-09 14:54:16,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=53300.0, ans=0.125 2024-08-09 14:54:16,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=53300.0, ans=0.125 2024-08-09 14:54:20,505 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.837e-01 2024-08-09 14:54:38,087 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5350, loss[loss=0.06326, beats_loss=0.01811, ecapa_loss=0.0004555, whisper_loss=0.0406, over 12388.00 frames. ], tot_loss[loss=0.1252, beats_loss=0.01349, ecapa_loss=0.0004931, whisper_loss=0.1068, over 3916673.28 frames. ], batch size: 55, lr: 4.06e-02, grad_scale: 128.0 2024-08-09 14:54:41,235 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 28 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-09 14:54:58,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=53600.0, ans=0.125 2024-08-09 14:55:08,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=53700.0, ans=0.125 2024-08-09 14:55:12,381 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 14:55:22,731 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-09 14:55:26,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=53800.0, ans=0.0 2024-08-09 14:55:30,453 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-09 14:55:31,661 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-09 14:55:33,601 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.75 vs. limit=15.0 2024-08-09 14:55:34,563 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.240e-01 2024-08-09 14:55:34,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=53900.0, ans=0.2 2024-08-09 14:55:39,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=53900.0, ans=0.125 2024-08-09 14:55:43,756 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+01 3.073e+01 3.494e+01 4.285e+01 8.308e+01, threshold=6.988e+01, percent-clipped=2.0 2024-08-09 14:55:43,777 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5400, loss[loss=0.1067, beats_loss=0.01647, ecapa_loss=0.0004105, whisper_loss=0.08609, over 22991.00 frames. ], tot_loss[loss=0.1259, beats_loss=0.01341, ecapa_loss=0.0004922, whisper_loss=0.1076, over 3895221.06 frames. ], batch size: 94, lr: 4.05e-02, grad_scale: 128.0 2024-08-09 14:55:53,068 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 10 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 14:56:01,067 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-09 14:56:05,285 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2024-08-09 14:56:07,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=54100.0, ans=0.125 2024-08-09 14:56:07,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=54100.0, ans=0.125 2024-08-09 14:56:08,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=54200.0, ans=0.1 2024-08-09 14:56:30,205 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 42 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-09 14:56:34,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=54400.0, ans=0.2 2024-08-09 14:56:37,943 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-09 14:56:41,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=54400.0, ans=0.0 2024-08-09 14:56:41,269 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.04 vs. limit=15.0 2024-08-09 14:56:47,958 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5450, loss[loss=0.09873, beats_loss=0.01277, ecapa_loss=0.0005455, whisper_loss=0.0805, over 16991.00 frames. ], tot_loss[loss=0.1255, beats_loss=0.01354, ecapa_loss=0.0004885, whisper_loss=0.107, over 3898287.54 frames. ], batch size: 69, lr: 4.05e-02, grad_scale: 128.0 2024-08-09 14:56:48,184 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 17 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-09 14:57:07,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=54600.0, ans=0.1 2024-08-09 14:57:08,228 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.31 vs. limit=22.5 2024-08-09 14:57:26,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=54800.0, ans=0.0 2024-08-09 14:57:51,690 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.257e+01 3.087e+01 3.659e+01 4.293e+01 7.884e+01, threshold=7.318e+01, percent-clipped=2.0 2024-08-09 14:57:51,710 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5500, loss[loss=0.103, beats_loss=0.01501, ecapa_loss=0.0004385, whisper_loss=0.08361, over 16180.00 frames. ], tot_loss[loss=0.1245, beats_loss=0.01349, ecapa_loss=0.0004904, whisper_loss=0.1061, over 3872821.57 frames. ], batch size: 66, lr: 4.04e-02, grad_scale: 128.0 2024-08-09 14:58:16,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=55200.0, ans=10.0 2024-08-09 14:58:16,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=55200.0, ans=0.125 2024-08-09 14:58:19,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=55200.0, ans=0.02 2024-08-09 14:58:23,274 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-08-09 14:58:28,975 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-09 14:58:39,361 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-09 14:58:51,251 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.35 vs. limit=22.5 2024-08-09 14:58:54,525 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-09 14:58:55,863 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5550, loss[loss=0.1255, beats_loss=0.01324, ecapa_loss=0.0004996, whisper_loss=0.1073, over 14521.00 frames. ], tot_loss[loss=0.125, beats_loss=0.01351, ecapa_loss=0.0004893, whisper_loss=0.1066, over 3870442.48 frames. ], batch size: 58, lr: 4.03e-02, grad_scale: 128.0 2024-08-09 14:59:04,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=55500.0, ans=0.0 2024-08-09 14:59:26,278 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-09 14:59:28,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=55700.0, ans=0.0 2024-08-09 14:59:32,385 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-09 14:59:38,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=55800.0, ans=0.0 2024-08-09 14:59:38,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=17.03 vs. limit=15.0 2024-08-09 14:59:39,211 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 35 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-09 14:59:41,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=55800.0, ans=0.0 2024-08-09 14:59:50,321 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 14:59:56,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=55900.0, ans=0.125 2024-08-09 14:59:57,629 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.63 vs. limit=22.5 2024-08-09 14:59:58,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=56000.0, ans=0.125 2024-08-09 14:59:59,658 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 3.194e+01 3.634e+01 4.385e+01 7.525e+01, threshold=7.268e+01, percent-clipped=1.0 2024-08-09 14:59:59,679 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5600, loss[loss=0.1357, beats_loss=0.01053, ecapa_loss=0.0006359, whisper_loss=0.1189, over 21360.00 frames. ], tot_loss[loss=0.1259, beats_loss=0.01341, ecapa_loss=0.0004886, whisper_loss=0.1076, over 3879300.11 frames. ], batch size: 91, lr: 4.03e-02, grad_scale: 128.0 2024-08-09 14:59:59,809 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-09 15:00:16,563 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 19 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-09 15:00:19,493 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.89 vs. limit=22.5 2024-08-09 15:00:19,629 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.72 vs. limit=22.5 2024-08-09 15:00:24,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=56200.0, ans=0.1 2024-08-09 15:00:30,457 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-09 15:00:30,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=56200.0, ans=0.2 2024-08-09 15:00:34,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=56200.0, ans=0.125 2024-08-09 15:00:48,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=56300.0, ans=0.0 2024-08-09 15:01:00,356 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-09 15:01:03,881 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5650, loss[loss=0.1559, beats_loss=0.01047, ecapa_loss=0.0006246, whisper_loss=0.1392, over 20663.00 frames. ], tot_loss[loss=0.1253, beats_loss=0.01348, ecapa_loss=0.0004872, whisper_loss=0.1069, over 3898507.50 frames. ], batch size: 81, lr: 4.02e-02, grad_scale: 128.0 2024-08-09 15:01:09,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=56500.0, ans=0.125 2024-08-09 15:01:09,942 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2024-08-09 15:01:14,418 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 19 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-09 15:01:15,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=56600.0, ans=0.0 2024-08-09 15:01:24,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=56600.0, ans=0.0 2024-08-09 15:01:26,748 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-09 15:01:28,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2024-08-09 15:01:42,903 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-09 15:01:50,736 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 34 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 15:01:52,471 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=15.0 2024-08-09 15:01:54,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=56900.0, ans=0.0 2024-08-09 15:02:08,363 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 3.137e+01 3.741e+01 4.572e+01 6.525e+01, threshold=7.481e+01, percent-clipped=0.0 2024-08-09 15:02:08,387 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5700, loss[loss=0.09715, beats_loss=0.01486, ecapa_loss=0.0004603, whisper_loss=0.07769, over 17605.00 frames. ], tot_loss[loss=0.125, beats_loss=0.01348, ecapa_loss=0.0004886, whisper_loss=0.1066, over 3881974.31 frames. ], batch size: 72, lr: 4.02e-02, grad_scale: 128.0 2024-08-09 15:02:08,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=57000.0, ans=0.125 2024-08-09 15:02:25,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=57100.0, ans=0.125 2024-08-09 15:02:25,689 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:02:30,931 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-09 15:02:32,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=57100.0, ans=0.1 2024-08-09 15:02:36,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=57200.0, ans=0.125 2024-08-09 15:02:39,697 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-09 15:02:42,815 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.43 vs. limit=15.0 2024-08-09 15:02:49,401 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.21 vs. limit=22.5 2024-08-09 15:03:00,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=57400.0, ans=0.0 2024-08-09 15:03:04,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.57 vs. limit=22.5 2024-08-09 15:03:05,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=57400.0, ans=0.2 2024-08-09 15:03:13,291 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5750, loss[loss=0.1516, beats_loss=0.01149, ecapa_loss=0.0004298, whisper_loss=0.1358, over 17341.00 frames. ], tot_loss[loss=0.1247, beats_loss=0.01343, ecapa_loss=0.0004857, whisper_loss=0.1064, over 3870021.31 frames. ], batch size: 67, lr: 4.01e-02, grad_scale: 128.0 2024-08-09 15:03:13,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=57500.0, ans=0.125 2024-08-09 15:03:21,053 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-09 15:03:38,367 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 15:03:41,077 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-09 15:03:41,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=57700.0, ans=0.1 2024-08-09 15:03:46,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=57700.0, ans=0.125 2024-08-09 15:04:08,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=57900.0, ans=0.1 2024-08-09 15:04:14,615 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.88 vs. limit=15.0 2024-08-09 15:04:18,883 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.151e+01 2.931e+01 3.260e+01 3.924e+01 8.527e+01, threshold=6.521e+01, percent-clipped=1.0 2024-08-09 15:04:18,905 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5800, loss[loss=0.13, beats_loss=0.01455, ecapa_loss=0.0004957, whisper_loss=0.1105, over 14827.00 frames. ], tot_loss[loss=0.1244, beats_loss=0.01353, ecapa_loss=0.0004806, whisper_loss=0.106, over 3839071.16 frames. ], batch size: 62, lr: 4.00e-02, grad_scale: 128.0 2024-08-09 15:04:29,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=58000.0, ans=0.125 2024-08-09 15:04:32,898 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-09 15:04:50,757 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.90 vs. limit=15.0 2024-08-09 15:05:00,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=58300.0, ans=0.125 2024-08-09 15:05:02,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=58300.0, ans=0.125 2024-08-09 15:05:10,275 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2024-08-09 15:05:25,425 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5850, loss[loss=0.1296, beats_loss=0.01272, ecapa_loss=0.0004076, whisper_loss=0.1128, over 17535.00 frames. ], tot_loss[loss=0.1238, beats_loss=0.01365, ecapa_loss=0.0004784, whisper_loss=0.1054, over 3845527.60 frames. ], batch size: 67, lr: 4.00e-02, grad_scale: 128.0 2024-08-09 15:05:30,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=58500.0, ans=0.125 2024-08-09 15:05:36,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=58500.0, ans=0.1 2024-08-09 15:05:43,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=58600.0, ans=0.1 2024-08-09 15:06:05,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=58800.0, ans=0.1 2024-08-09 15:06:05,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=58800.0, ans=0.125 2024-08-09 15:06:07,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=58800.0, ans=0.2 2024-08-09 15:06:09,595 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.49 vs. limit=10.0 2024-08-09 15:06:11,977 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 15:06:12,789 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=12.0 2024-08-09 15:06:24,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=58900.0, ans=0.0 2024-08-09 15:06:33,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=59000.0, ans=0.125 2024-08-09 15:06:33,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=59000.0, ans=0.0 2024-08-09 15:06:34,142 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 3.149e+01 3.698e+01 4.735e+01 7.316e+01, threshold=7.396e+01, percent-clipped=3.0 2024-08-09 15:06:34,167 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5900, loss[loss=0.1193, beats_loss=0.009346, ecapa_loss=0.000514, whisper_loss=0.1049, over 21575.00 frames. ], tot_loss[loss=0.1239, beats_loss=0.01359, ecapa_loss=0.0004764, whisper_loss=0.1056, over 3848937.14 frames. ], batch size: 87, lr: 3.99e-02, grad_scale: 128.0 2024-08-09 15:06:34,366 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 29 from Vox, 21 fro AS 2024-08-09 15:06:40,354 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2024-08-09 15:06:52,925 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-09 15:06:56,975 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 15:07:27,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=59400.0, ans=0.125 2024-08-09 15:07:27,843 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2024-08-09 15:07:28,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=59400.0, ans=0.035 2024-08-09 15:07:29,008 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=20.11 vs. limit=15.0 2024-08-09 15:07:31,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=59400.0, ans=0.125 2024-08-09 15:07:32,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=59400.0, ans=0.09899494936611666 2024-08-09 15:07:40,596 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 5950, loss[loss=0.1651, beats_loss=0.009499, ecapa_loss=0.0005784, whisper_loss=0.1498, over 22410.00 frames. ], tot_loss[loss=0.1235, beats_loss=0.01364, ecapa_loss=0.0004743, whisper_loss=0.1051, over 3844520.06 frames. ], batch size: 89, lr: 3.98e-02, grad_scale: 128.0 2024-08-09 15:07:49,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=59500.0, ans=0.1 2024-08-09 15:07:53,580 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-09 15:07:54,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=59600.0, ans=0.035 2024-08-09 15:07:56,663 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 15:08:10,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=59700.0, ans=0.125 2024-08-09 15:08:11,596 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 32 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-09 15:08:20,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=59700.0, ans=0.125 2024-08-09 15:08:26,520 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2024-08-09 15:08:31,354 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-09 15:08:34,847 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.81 vs. limit=15.0 2024-08-09 15:08:40,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=59900.0, ans=0.125 2024-08-09 15:08:46,778 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-09 15:08:52,066 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.193e+01 2.855e+01 3.241e+01 4.234e+01 7.891e+01, threshold=6.482e+01, percent-clipped=2.0 2024-08-09 15:08:52,086 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6000, loss[loss=0.1424, beats_loss=0.01071, ecapa_loss=0.0005216, whisper_loss=0.1265, over 19465.00 frames. ], tot_loss[loss=0.1235, beats_loss=0.01357, ecapa_loss=0.0004735, whisper_loss=0.1052, over 3851637.42 frames. ], batch size: 78, lr: 3.98e-02, grad_scale: 256.0 2024-08-09 15:08:52,087 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-09 15:09:28,546 INFO [train_multi_KD3.py:1149] (2/4) Epoch 1, validation on ASR_libri: loss=0.2951, beats_loss=0, ecapa_loss=0.001297, whisper_loss=0.2822, over 922467.00 frames. 2024-08-09 15:09:46,183 INFO [train_multi_KD3.py:1149] (2/4) Epoch 1, validation on SV_voxceleb1: loss=0.01236, beats_loss=0, ecapa_loss=0.001236, whisper_loss=0, over 939242.00 frames. 2024-08-09 15:10:08,615 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.6589, 2.0200, 2.1908, 1.8712], device='cuda:2') 2024-08-09 15:11:29,861 INFO [train_multi_KD3.py:1149] (2/4) Epoch 1, validation on AT_audioset: loss=0.03246, beats_loss=0.03246, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 15:11:29,866 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-09 15:11:49,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=60100.0, ans=0.125 2024-08-09 15:12:07,785 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-09 15:12:19,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=60300.0, ans=15.0 2024-08-09 15:12:24,029 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.25 vs. limit=15.0 2024-08-09 15:12:44,990 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6050, loss[loss=0.1179, beats_loss=0.01528, ecapa_loss=0.0004353, whisper_loss=0.09825, over 21846.00 frames. ], tot_loss[loss=0.1237, beats_loss=0.01364, ecapa_loss=0.000468, whisper_loss=0.1054, over 3829610.14 frames. ], batch size: 90, lr: 3.97e-02, grad_scale: 256.0 2024-08-09 15:12:53,734 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 16 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-09 15:12:55,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=60500.0, ans=0.125 2024-08-09 15:13:30,618 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-09 15:13:40,660 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 15:13:59,498 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 3.011e+01 3.542e+01 4.337e+01 6.873e+01, threshold=7.084e+01, percent-clipped=1.0 2024-08-09 15:13:59,519 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6100, loss[loss=0.1141, beats_loss=0.01279, ecapa_loss=0.0005721, whisper_loss=0.09554, over 18379.00 frames. ], tot_loss[loss=0.1244, beats_loss=0.01347, ecapa_loss=0.000471, whisper_loss=0.1062, over 3876672.69 frames. ], batch size: 78, lr: 3.96e-02, grad_scale: 256.0 2024-08-09 15:14:21,598 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-09 15:14:23,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=61100.0, ans=0.125 2024-08-09 15:14:29,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=61200.0, ans=0.1 2024-08-09 15:14:53,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=61300.0, ans=0.125 2024-08-09 15:14:56,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=61300.0, ans=0.125 2024-08-09 15:15:13,749 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6150, loss[loss=0.142, beats_loss=0.01206, ecapa_loss=0.000529, whisper_loss=0.1246, over 18154.00 frames. ], tot_loss[loss=0.1249, beats_loss=0.01342, ecapa_loss=0.0004707, whisper_loss=0.1068, over 3866579.13 frames. ], batch size: 74, lr: 3.96e-02, grad_scale: 256.0 2024-08-09 15:15:24,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=61500.0, ans=0.0 2024-08-09 15:15:33,682 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 31 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 15:15:33,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=61600.0, ans=0.2 2024-08-09 15:15:53,153 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.17 vs. limit=15.0 2024-08-09 15:15:56,645 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-09 15:16:12,842 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 37 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 15:16:15,625 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 15:16:16,793 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-09 15:16:17,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=61900.0, ans=0.0 2024-08-09 15:16:20,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=61900.0, ans=0.0 2024-08-09 15:16:27,960 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 3.116e+01 3.579e+01 4.385e+01 6.920e+01, threshold=7.157e+01, percent-clipped=0.0 2024-08-09 15:16:27,984 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6200, loss[loss=0.09841, beats_loss=0.01956, ecapa_loss=0.0003757, whisper_loss=0.07509, over 18940.00 frames. ], tot_loss[loss=0.1249, beats_loss=0.01341, ecapa_loss=0.0004723, whisper_loss=0.1067, over 3879081.67 frames. ], batch size: 76, lr: 3.95e-02, grad_scale: 256.0 2024-08-09 15:16:29,454 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 29 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-09 15:16:37,236 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 15:16:45,333 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.00 vs. limit=22.5 2024-08-09 15:17:07,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=62200.0, ans=0.125 2024-08-09 15:17:25,134 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 12 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 15:17:28,684 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.33 vs. limit=22.5 2024-08-09 15:17:43,292 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=12.0 2024-08-09 15:17:43,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6250, loss[loss=0.1191, beats_loss=0.01315, ecapa_loss=0.0004379, whisper_loss=0.1016, over 22656.00 frames. ], tot_loss[loss=0.124, beats_loss=0.01343, ecapa_loss=0.0004715, whisper_loss=0.1059, over 3854496.82 frames. ], batch size: 90, lr: 3.94e-02, grad_scale: 256.0 2024-08-09 15:17:59,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=62600.0, ans=0.0 2024-08-09 15:18:10,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=62600.0, ans=0.1 2024-08-09 15:18:32,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=62800.0, ans=0.0 2024-08-09 15:18:32,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=62800.0, ans=0.0 2024-08-09 15:18:38,142 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2024-08-09 15:18:50,068 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 12 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 15:18:53,467 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 32 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-09 15:18:53,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=62900.0, ans=0.125 2024-08-09 15:19:00,070 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.251e+01 2.965e+01 3.406e+01 4.255e+01 1.028e+02, threshold=6.812e+01, percent-clipped=2.0 2024-08-09 15:19:00,093 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6300, loss[loss=0.1245, beats_loss=0.01128, ecapa_loss=0.0005471, whisper_loss=0.1078, over 22877.00 frames. ], tot_loss[loss=0.124, beats_loss=0.01332, ecapa_loss=0.0004715, whisper_loss=0.106, over 3829292.86 frames. ], batch size: 94, lr: 3.94e-02, grad_scale: 256.0 2024-08-09 15:19:00,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=63000.0, ans=0.125 2024-08-09 15:19:08,690 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.05 vs. limit=15.0 2024-08-09 15:19:09,762 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.32 vs. limit=6.0 2024-08-09 15:19:10,685 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-09 15:19:31,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=63200.0, ans=0.2 2024-08-09 15:19:45,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=63300.0, ans=0.125 2024-08-09 15:20:18,868 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6350, loss[loss=0.08463, beats_loss=0.0123, ecapa_loss=0.0004476, whisper_loss=0.06785, over 14896.00 frames. ], tot_loss[loss=0.124, beats_loss=0.01342, ecapa_loss=0.0004709, whisper_loss=0.1058, over 3822565.13 frames. ], batch size: 55, lr: 3.93e-02, grad_scale: 256.0 2024-08-09 15:20:24,028 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.73 vs. limit=15.0 2024-08-09 15:20:31,434 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 30 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-09 15:20:37,678 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2024-08-09 15:20:55,836 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-09 15:21:04,989 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=15.0 2024-08-09 15:21:17,287 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-09 15:21:21,639 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=1.030e-02 2024-08-09 15:21:21,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=63900.0, ans=0.125 2024-08-09 15:21:30,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=63900.0, ans=0.0 2024-08-09 15:21:35,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=63900.0, ans=0.2 2024-08-09 15:21:38,643 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 3.075e+01 3.568e+01 4.201e+01 6.933e+01, threshold=7.136e+01, percent-clipped=1.0 2024-08-09 15:21:38,677 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6400, loss[loss=0.1089, beats_loss=0.01415, ecapa_loss=0.0006201, whisper_loss=0.08854, over 17411.00 frames. ], tot_loss[loss=0.1237, beats_loss=0.01346, ecapa_loss=0.0004697, whisper_loss=0.1056, over 3853331.13 frames. ], batch size: 76, lr: 3.92e-02, grad_scale: 256.0 2024-08-09 15:21:40,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=64000.0, ans=0.035 2024-08-09 15:21:42,735 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:21:58,043 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-09 15:21:59,521 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-09 15:22:00,494 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=15.0 2024-08-09 15:22:15,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=64200.0, ans=0.5 2024-08-09 15:22:15,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=64200.0, ans=0.1 2024-08-09 15:22:26,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=64300.0, ans=0.07 2024-08-09 15:22:31,443 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=32.24 vs. limit=22.5 2024-08-09 15:22:37,494 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 15:22:41,334 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.35 vs. limit=22.5 2024-08-09 15:22:50,651 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.37 vs. limit=15.0 2024-08-09 15:22:51,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=64400.0, ans=0.0 2024-08-09 15:22:56,269 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 15:22:57,764 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6450, loss[loss=0.143, beats_loss=0.01316, ecapa_loss=0.0004892, whisper_loss=0.1249, over 16289.00 frames. ], tot_loss[loss=0.1234, beats_loss=0.01357, ecapa_loss=0.0004684, whisper_loss=0.1052, over 3881015.81 frames. ], batch size: 67, lr: 3.92e-02, grad_scale: 256.0 2024-08-09 15:23:17,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=64600.0, ans=0.125 2024-08-09 15:23:40,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=64700.0, ans=0.125 2024-08-09 15:23:52,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=64800.0, ans=0.2 2024-08-09 15:23:56,987 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.37 vs. limit=22.5 2024-08-09 15:24:07,908 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-09 15:24:09,122 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-09 15:24:17,653 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 3.103e+01 3.527e+01 4.351e+01 8.335e+01, threshold=7.053e+01, percent-clipped=1.0 2024-08-09 15:24:17,673 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6500, loss[loss=0.09468, beats_loss=0.0176, ecapa_loss=0.0004234, whisper_loss=0.07285, over 20751.00 frames. ], tot_loss[loss=0.1238, beats_loss=0.01362, ecapa_loss=0.0004598, whisper_loss=0.1056, over 3911504.57 frames. ], batch size: 88, lr: 3.91e-02, grad_scale: 256.0 2024-08-09 15:24:21,575 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-09 15:24:21,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=65000.0, ans=0.0 2024-08-09 15:24:25,322 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2024-08-09 15:24:47,935 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.86 vs. limit=15.0 2024-08-09 15:24:53,549 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-09 15:25:04,652 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 15:25:07,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=65300.0, ans=0.125 2024-08-09 15:25:08,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=65300.0, ans=0.1 2024-08-09 15:25:12,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=65300.0, ans=15.0 2024-08-09 15:25:14,302 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 36 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-09 15:25:16,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=65300.0, ans=0.125 2024-08-09 15:25:21,715 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.12 vs. limit=15.0 2024-08-09 15:25:32,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=65400.0, ans=0.125 2024-08-09 15:25:37,166 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=15.0 2024-08-09 15:25:37,889 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6550, loss[loss=0.1291, beats_loss=0.01381, ecapa_loss=0.0004568, whisper_loss=0.1107, over 20128.00 frames. ], tot_loss[loss=0.1242, beats_loss=0.01368, ecapa_loss=0.0004585, whisper_loss=0.106, over 3898655.47 frames. ], batch size: 79, lr: 3.91e-02, grad_scale: 256.0 2024-08-09 15:25:43,498 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-09 15:26:05,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=65600.0, ans=15.0 2024-08-09 15:26:10,997 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 37 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-09 15:26:12,671 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-09 15:26:28,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=65800.0, ans=0.125 2024-08-09 15:26:32,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=65800.0, ans=0.125 2024-08-09 15:26:41,023 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-09 15:26:42,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=65900.0, ans=0.0 2024-08-09 15:26:45,373 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 15:26:57,205 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 3.063e+01 3.628e+01 4.391e+01 7.750e+01, threshold=7.256e+01, percent-clipped=3.0 2024-08-09 15:26:57,229 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6600, loss[loss=0.1105, beats_loss=0.01294, ecapa_loss=0.0004725, whisper_loss=0.09283, over 22189.00 frames. ], tot_loss[loss=0.1254, beats_loss=0.01349, ecapa_loss=0.0004627, whisper_loss=0.1073, over 3948109.84 frames. ], batch size: 91, lr: 3.90e-02, grad_scale: 256.0 2024-08-09 15:27:09,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=66000.0, ans=0.125 2024-08-09 15:27:29,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=66200.0, ans=0.04949747468305833 2024-08-09 15:27:30,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=66200.0, ans=0.2 2024-08-09 15:27:32,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=66200.0, ans=0.125 2024-08-09 15:27:34,991 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:27:36,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=66200.0, ans=0.0 2024-08-09 15:27:47,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=12.0 2024-08-09 15:28:00,345 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-09 15:28:10,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=66400.0, ans=0.1 2024-08-09 15:28:14,636 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6650, loss[loss=0.1102, beats_loss=0.01435, ecapa_loss=0.00041, whisper_loss=0.09174, over 18496.00 frames. ], tot_loss[loss=0.1257, beats_loss=0.01349, ecapa_loss=0.0004624, whisper_loss=0.1075, over 3984527.18 frames. ], batch size: 74, lr: 3.89e-02, grad_scale: 256.0 2024-08-09 15:28:32,604 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 15:28:41,224 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-09 15:28:50,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=66700.0, ans=0.0 2024-08-09 15:28:51,946 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 15:29:06,737 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-09 15:29:25,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=66900.0, ans=0.2 2024-08-09 15:29:31,297 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 3.012e+01 3.433e+01 4.224e+01 7.038e+01, threshold=6.866e+01, percent-clipped=0.0 2024-08-09 15:29:31,318 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6700, loss[loss=0.1304, beats_loss=0.01489, ecapa_loss=0.000433, whisper_loss=0.1112, over 22355.00 frames. ], tot_loss[loss=0.1253, beats_loss=0.01353, ecapa_loss=0.0004603, whisper_loss=0.1072, over 3945898.60 frames. ], batch size: 90, lr: 3.89e-02, grad_scale: 256.0 2024-08-09 15:29:37,437 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-09 15:29:52,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67100.0, ans=0.1 2024-08-09 15:30:02,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=67200.0, ans=0.2 2024-08-09 15:30:04,674 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.91 vs. limit=10.0 2024-08-09 15:30:12,168 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 15:30:13,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=67200.0, ans=0.125 2024-08-09 15:30:34,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=67400.0, ans=0.125 2024-08-09 15:30:38,176 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 15:30:46,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=67500.0, ans=0.1 2024-08-09 15:30:47,540 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6750, loss[loss=0.1135, beats_loss=0.01412, ecapa_loss=0.0004706, whisper_loss=0.09464, over 22375.00 frames. ], tot_loss[loss=0.1252, beats_loss=0.01346, ecapa_loss=0.0004611, whisper_loss=0.1071, over 3904104.76 frames. ], batch size: 93, lr: 3.88e-02, grad_scale: 256.0 2024-08-09 15:30:54,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=67500.0, ans=0.0 2024-08-09 15:31:15,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=67600.0, ans=0.1 2024-08-09 15:31:15,931 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.24 vs. limit=15.0 2024-08-09 15:31:24,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=67700.0, ans=0.125 2024-08-09 15:31:58,893 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.31 vs. limit=22.5 2024-08-09 15:32:03,552 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 3.094e+01 3.540e+01 4.120e+01 7.157e+01, threshold=7.079e+01, percent-clipped=1.0 2024-08-09 15:32:03,572 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6800, loss[loss=0.1324, beats_loss=0.01292, ecapa_loss=0.0005463, whisper_loss=0.114, over 18084.00 frames. ], tot_loss[loss=0.1247, beats_loss=0.01347, ecapa_loss=0.0004605, whisper_loss=0.1066, over 3885984.81 frames. ], batch size: 73, lr: 3.87e-02, grad_scale: 256.0 2024-08-09 15:32:07,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=68000.0, ans=0.125 2024-08-09 15:32:09,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=68000.0, ans=0.0 2024-08-09 15:32:31,533 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 28 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-09 15:32:32,745 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-09 15:32:42,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=68200.0, ans=0.0 2024-08-09 15:33:17,943 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6850, loss[loss=0.1228, beats_loss=0.01283, ecapa_loss=0.0004666, whisper_loss=0.1053, over 19651.00 frames. ], tot_loss[loss=0.1254, beats_loss=0.01336, ecapa_loss=0.0004616, whisper_loss=0.1074, over 3890051.48 frames. ], batch size: 80, lr: 3.87e-02, grad_scale: 256.0 2024-08-09 15:33:33,639 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 19 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 15:34:10,946 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.72 vs. limit=22.5 2024-08-09 15:34:13,727 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-09 15:34:31,800 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-09 15:34:33,090 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 3.018e+01 3.583e+01 4.075e+01 7.184e+01, threshold=7.167e+01, percent-clipped=2.0 2024-08-09 15:34:33,110 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6900, loss[loss=0.119, beats_loss=0.01346, ecapa_loss=0.0004187, whisper_loss=0.1013, over 22589.00 frames. ], tot_loss[loss=0.1244, beats_loss=0.01347, ecapa_loss=0.0004586, whisper_loss=0.1064, over 3879382.68 frames. ], batch size: 91, lr: 3.86e-02, grad_scale: 256.0 2024-08-09 15:34:35,326 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-09 15:34:40,831 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-09 15:35:01,223 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-09 15:35:07,448 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2024-08-09 15:35:33,110 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=6.865e+00 2024-08-09 15:35:34,340 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 12 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-09 15:35:49,168 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2024-08-09 15:35:49,648 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 6950, loss[loss=0.131, beats_loss=0.01352, ecapa_loss=0.0004538, whisper_loss=0.113, over 15822.00 frames. ], tot_loss[loss=0.1239, beats_loss=0.01345, ecapa_loss=0.0004557, whisper_loss=0.1059, over 3854117.36 frames. ], batch size: 63, lr: 3.85e-02, grad_scale: 256.0 2024-08-09 15:35:50,412 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.30 vs. limit=12.0 2024-08-09 15:36:17,316 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.50 vs. limit=22.5 2024-08-09 15:36:30,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=69700.0, ans=0.0 2024-08-09 15:36:37,386 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.47 vs. limit=15.0 2024-08-09 15:36:52,029 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.31 vs. limit=15.0 2024-08-09 15:37:07,923 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 3.090e+01 3.523e+01 4.430e+01 8.295e+01, threshold=7.046e+01, percent-clipped=3.0 2024-08-09 15:37:07,943 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7000, loss[loss=0.1404, beats_loss=0.0112, ecapa_loss=0.0004724, whisper_loss=0.1245, over 18697.00 frames. ], tot_loss[loss=0.1235, beats_loss=0.01347, ecapa_loss=0.0004544, whisper_loss=0.1055, over 3854522.78 frames. ], batch size: 71, lr: 3.85e-02, grad_scale: 256.0 2024-08-09 15:37:09,922 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 15:37:14,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=70000.0, ans=0.125 2024-08-09 15:37:35,267 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.26 vs. limit=22.5 2024-08-09 15:37:36,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=70100.0, ans=0.0 2024-08-09 15:37:38,514 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.57 vs. limit=10.0 2024-08-09 15:37:43,875 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 15:37:50,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=70200.0, ans=0.1 2024-08-09 15:38:29,035 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7050, loss[loss=0.1217, beats_loss=0.01293, ecapa_loss=0.0003836, whisper_loss=0.1049, over 18271.00 frames. ], tot_loss[loss=0.1233, beats_loss=0.01342, ecapa_loss=0.0004543, whisper_loss=0.1053, over 3877316.80 frames. ], batch size: 70, lr: 3.84e-02, grad_scale: 256.0 2024-08-09 15:38:41,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=70500.0, ans=0.1 2024-08-09 15:38:43,592 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-09 15:38:43,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=70600.0, ans=0.125 2024-08-09 15:38:49,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=70600.0, ans=0.0 2024-08-09 15:38:55,188 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-09 15:39:01,027 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 15:39:15,448 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.56 vs. limit=22.5 2024-08-09 15:39:17,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=70800.0, ans=0.1 2024-08-09 15:39:26,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=70800.0, ans=0.0 2024-08-09 15:39:37,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=70900.0, ans=0.1 2024-08-09 15:39:43,668 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.931e+01 3.439e+01 4.149e+01 6.385e+01, threshold=6.878e+01, percent-clipped=0.0 2024-08-09 15:39:43,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7100, loss[loss=0.1378, beats_loss=0.01319, ecapa_loss=0.0004022, whisper_loss=0.1206, over 23261.00 frames. ], tot_loss[loss=0.1234, beats_loss=0.01342, ecapa_loss=0.0004507, whisper_loss=0.1054, over 3866915.68 frames. ], batch size: 93, lr: 3.83e-02, grad_scale: 256.0 2024-08-09 15:39:45,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=71000.0, ans=0.1 2024-08-09 15:39:46,841 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:39:51,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=71000.0, ans=0.125 2024-08-09 15:39:58,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=71100.0, ans=0.2 2024-08-09 15:40:04,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=71100.0, ans=0.1 2024-08-09 15:40:13,541 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2024-08-09 15:40:25,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=71200.0, ans=0.125 2024-08-09 15:40:29,691 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 15 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 15:40:45,318 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-09 15:40:45,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=71400.0, ans=0.1 2024-08-09 15:40:52,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=71400.0, ans=0.125 2024-08-09 15:41:00,823 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7150, loss[loss=0.1326, beats_loss=0.01288, ecapa_loss=0.0003467, whisper_loss=0.1162, over 19154.00 frames. ], tot_loss[loss=0.1231, beats_loss=0.01346, ecapa_loss=0.0004496, whisper_loss=0.1051, over 3861798.38 frames. ], batch size: 71, lr: 3.83e-02, grad_scale: 256.0 2024-08-09 15:41:21,736 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-09 15:41:23,001 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-09 15:41:29,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=71700.0, ans=0.0 2024-08-09 15:41:44,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=71800.0, ans=0.0 2024-08-09 15:41:53,319 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 15:42:03,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=71900.0, ans=0.125 2024-08-09 15:42:03,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=71900.0, ans=0.1 2024-08-09 15:42:21,649 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 3.087e+01 3.536e+01 4.239e+01 7.384e+01, threshold=7.073e+01, percent-clipped=1.0 2024-08-09 15:42:21,669 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7200, loss[loss=0.1174, beats_loss=0.01228, ecapa_loss=0.0004715, whisper_loss=0.1004, over 21394.00 frames. ], tot_loss[loss=0.1231, beats_loss=0.01345, ecapa_loss=0.0004489, whisper_loss=0.1052, over 3872855.90 frames. ], batch size: 87, lr: 3.82e-02, grad_scale: 256.0 2024-08-09 15:42:55,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=72100.0, ans=0.035 2024-08-09 15:43:00,260 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-09 15:43:04,451 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.05 vs. limit=15.0 2024-08-09 15:43:16,938 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-09 15:43:20,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=72300.0, ans=0.2 2024-08-09 15:43:27,000 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.55 vs. limit=6.0 2024-08-09 15:43:34,152 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 15:43:46,367 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7250, loss[loss=0.1387, beats_loss=0.013, ecapa_loss=0.0004316, whisper_loss=0.1213, over 16135.00 frames. ], tot_loss[loss=0.1242, beats_loss=0.01341, ecapa_loss=0.0004493, whisper_loss=0.1063, over 3880769.22 frames. ], batch size: 63, lr: 3.82e-02, grad_scale: 256.0 2024-08-09 15:43:50,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=72500.0, ans=0.125 2024-08-09 15:44:07,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=72600.0, ans=0.0 2024-08-09 15:44:19,969 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.65 vs. limit=22.5 2024-08-09 15:44:39,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=72700.0, ans=0.1 2024-08-09 15:44:44,091 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-09 15:44:55,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=12.0 2024-08-09 15:45:00,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=72800.0, ans=0.0 2024-08-09 15:45:01,252 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 15:45:12,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=72900.0, ans=0.0 2024-08-09 15:45:14,249 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.33 vs. limit=6.0 2024-08-09 15:45:20,046 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.182e+01 3.061e+01 3.709e+01 4.320e+01 7.317e+01, threshold=7.418e+01, percent-clipped=1.0 2024-08-09 15:45:20,071 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7300, loss[loss=0.08886, beats_loss=0.01576, ecapa_loss=0.0004213, whisper_loss=0.06889, over 15568.00 frames. ], tot_loss[loss=0.1239, beats_loss=0.01342, ecapa_loss=0.0004499, whisper_loss=0.1059, over 3857463.80 frames. ], batch size: 64, lr: 3.81e-02, grad_scale: 256.0 2024-08-09 15:45:25,997 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-09 15:45:37,359 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-09 15:45:45,564 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 15:46:05,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=73200.0, ans=0.1 2024-08-09 15:46:22,253 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.54 vs. limit=15.0 2024-08-09 15:46:29,413 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 15:46:50,620 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2024-08-09 15:46:52,756 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7350, loss[loss=0.1236, beats_loss=0.01391, ecapa_loss=0.0004017, whisper_loss=0.1057, over 20292.00 frames. ], tot_loss[loss=0.1226, beats_loss=0.01349, ecapa_loss=0.0004525, whisper_loss=0.1046, over 3843529.37 frames. ], batch size: 82, lr: 3.80e-02, grad_scale: 256.0 2024-08-09 15:47:11,335 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.09 vs. limit=15.0 2024-08-09 15:47:25,585 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.70 vs. limit=10.0 2024-08-09 15:47:29,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=73700.0, ans=0.125 2024-08-09 15:47:32,411 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.67 vs. limit=15.0 2024-08-09 15:47:33,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=73700.0, ans=0.0 2024-08-09 15:47:49,463 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 15 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-09 15:47:52,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=73800.0, ans=0.125 2024-08-09 15:47:56,307 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=12.0 2024-08-09 15:47:57,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=73800.0, ans=0.0 2024-08-09 15:47:57,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=73800.0, ans=0.125 2024-08-09 15:48:14,905 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=15.0 2024-08-09 15:48:21,304 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.843e+01 3.393e+01 4.039e+01 7.371e+01, threshold=6.786e+01, percent-clipped=0.0 2024-08-09 15:48:21,324 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7400, loss[loss=0.1216, beats_loss=0.01344, ecapa_loss=0.0004348, whisper_loss=0.1039, over 21589.00 frames. ], tot_loss[loss=0.1219, beats_loss=0.01361, ecapa_loss=0.00045, whisper_loss=0.1038, over 3849330.73 frames. ], batch size: 86, lr: 3.80e-02, grad_scale: 256.0 2024-08-09 15:48:21,445 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 19 from Vox, 52 fro AS 2024-08-09 15:48:23,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=74000.0, ans=0.125 2024-08-09 15:48:30,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=74000.0, ans=0.1 2024-08-09 15:48:36,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=74000.0, ans=0.2 2024-08-09 15:48:51,570 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:49:03,192 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2024-08-09 15:49:33,450 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.03 vs. limit=10.0 2024-08-09 15:49:42,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=74400.0, ans=0.125 2024-08-09 15:49:44,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=74400.0, ans=15.0 2024-08-09 15:49:50,630 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 13 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-09 15:49:52,697 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-09 15:49:56,543 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7450, loss[loss=0.1243, beats_loss=0.01486, ecapa_loss=0.0004961, whisper_loss=0.1045, over 14297.00 frames. ], tot_loss[loss=0.121, beats_loss=0.01375, ecapa_loss=0.0004461, whisper_loss=0.1028, over 3853470.89 frames. ], batch size: 59, lr: 3.79e-02, grad_scale: 256.0 2024-08-09 15:49:58,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=74500.0, ans=0.125 2024-08-09 15:50:00,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=74500.0, ans=0.2 2024-08-09 15:50:08,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=74500.0, ans=0.2 2024-08-09 15:50:08,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=74500.0, ans=0.2 2024-08-09 15:50:08,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=74500.0, ans=0.125 2024-08-09 15:50:22,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=74600.0, ans=0.2 2024-08-09 15:50:33,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=74700.0, ans=0.0 2024-08-09 15:50:37,719 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-09 15:50:45,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=74800.0, ans=0.125 2024-08-09 15:50:50,878 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-09 15:50:59,993 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 15:51:02,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=74900.0, ans=0.125 2024-08-09 15:51:13,749 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 3.130e+01 3.399e+01 4.155e+01 7.076e+01, threshold=6.798e+01, percent-clipped=1.0 2024-08-09 15:51:13,773 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7500, loss[loss=0.1424, beats_loss=0.008938, ecapa_loss=0.0006049, whisper_loss=0.1274, over 15267.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01361, ecapa_loss=0.0004456, whisper_loss=0.1046, over 3910712.33 frames. ], batch size: 58, lr: 3.78e-02, grad_scale: 256.0 2024-08-09 15:51:24,634 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 15:51:24,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=75000.0, ans=0.125 2024-08-09 15:51:34,662 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 15:51:43,059 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 15:51:53,611 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-09 15:51:54,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=75200.0, ans=0.0 2024-08-09 15:52:04,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=75300.0, ans=0.125 2024-08-09 15:52:08,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=75300.0, ans=0.125 2024-08-09 15:52:24,856 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7550, loss[loss=0.1375, beats_loss=0.0153, ecapa_loss=0.0003467, whisper_loss=0.1187, over 22316.00 frames. ], tot_loss[loss=0.1223, beats_loss=0.01363, ecapa_loss=0.0004427, whisper_loss=0.1042, over 3871002.45 frames. ], batch size: 87, lr: 3.78e-02, grad_scale: 256.0 2024-08-09 15:52:29,955 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-09 15:52:42,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=75600.0, ans=0.125 2024-08-09 15:52:44,587 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2024-08-09 15:52:45,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=75600.0, ans=0.125 2024-08-09 15:53:01,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=75700.0, ans=15.0 2024-08-09 15:53:15,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=75800.0, ans=0.0 2024-08-09 15:53:28,000 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2024-08-09 15:53:33,236 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=17.60 vs. limit=15.0 2024-08-09 15:53:35,427 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.089e+01 3.036e+01 3.542e+01 4.226e+01 5.898e+01, threshold=7.084e+01, percent-clipped=0.0 2024-08-09 15:53:35,448 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7600, loss[loss=0.1254, beats_loss=0.01655, ecapa_loss=0.0003873, whisper_loss=0.1049, over 19026.00 frames. ], tot_loss[loss=0.123, beats_loss=0.01344, ecapa_loss=0.000444, whisper_loss=0.1051, over 3872654.53 frames. ], batch size: 76, lr: 3.77e-02, grad_scale: 256.0 2024-08-09 15:53:41,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=76000.0, ans=0.0 2024-08-09 15:53:44,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=76000.0, ans=0.1 2024-08-09 15:53:52,974 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:54:00,913 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-09 15:54:09,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=76200.0, ans=0.2 2024-08-09 15:54:16,127 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 15:54:25,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=76300.0, ans=0.0 2024-08-09 15:54:31,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=76400.0, ans=0.0 2024-08-09 15:54:33,522 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 17 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-09 15:54:40,087 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-09 15:54:46,115 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7650, loss[loss=0.1206, beats_loss=0.01112, ecapa_loss=0.0005718, whisper_loss=0.1037, over 18794.00 frames. ], tot_loss[loss=0.1231, beats_loss=0.01339, ecapa_loss=0.0004453, whisper_loss=0.1053, over 3863630.49 frames. ], batch size: 79, lr: 3.77e-02, grad_scale: 256.0 2024-08-09 15:55:00,760 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-09 15:55:05,578 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2024-08-09 15:55:16,186 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.207e+00 2024-08-09 15:55:20,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=76700.0, ans=0.025 2024-08-09 15:55:24,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=76700.0, ans=0.0 2024-08-09 15:55:26,564 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 13 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 15:55:35,610 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.77 vs. limit=15.0 2024-08-09 15:55:40,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=76900.0, ans=0.1 2024-08-09 15:55:44,878 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:55:45,108 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.77 vs. limit=15.0 2024-08-09 15:55:47,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=76900.0, ans=0.0 2024-08-09 15:55:50,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=76900.0, ans=0.05 2024-08-09 15:55:54,058 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-09 15:55:55,126 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.321e+01 3.065e+01 3.556e+01 4.140e+01 7.466e+01, threshold=7.113e+01, percent-clipped=1.0 2024-08-09 15:55:55,150 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7700, loss[loss=0.1212, beats_loss=0.0126, ecapa_loss=0.0004588, whisper_loss=0.104, over 17325.00 frames. ], tot_loss[loss=0.1219, beats_loss=0.01342, ecapa_loss=0.0004449, whisper_loss=0.1041, over 3852239.37 frames. ], batch size: 69, lr: 3.76e-02, grad_scale: 256.0 2024-08-09 15:55:55,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=77000.0, ans=0.0 2024-08-09 15:55:58,232 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 15:56:05,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=77000.0, ans=0.0 2024-08-09 15:56:21,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=77100.0, ans=0.125 2024-08-09 15:56:23,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=77200.0, ans=0.125 2024-08-09 15:56:24,464 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-09 15:56:32,925 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 15:56:34,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=77200.0, ans=0.0 2024-08-09 15:56:35,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=77200.0, ans=0.2 2024-08-09 15:56:38,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=77300.0, ans=0.0 2024-08-09 15:56:42,143 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.13 vs. limit=22.5 2024-08-09 15:56:44,473 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 19 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-09 15:56:48,094 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=15.0 2024-08-09 15:56:57,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=77400.0, ans=0.125 2024-08-09 15:57:07,536 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7750, loss[loss=0.1241, beats_loss=0.01483, ecapa_loss=0.0002854, whisper_loss=0.1064, over 21674.00 frames. ], tot_loss[loss=0.1216, beats_loss=0.01348, ecapa_loss=0.0004402, whisper_loss=0.1037, over 3891502.64 frames. ], batch size: 82, lr: 3.75e-02, grad_scale: 256.0 2024-08-09 15:57:13,370 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-09 15:57:13,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=77500.0, ans=0.125 2024-08-09 15:57:16,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=77500.0, ans=0.0 2024-08-09 15:57:26,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=77600.0, ans=0.125 2024-08-09 15:57:28,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=77600.0, ans=0.125 2024-08-09 15:57:41,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=77700.0, ans=0.2 2024-08-09 15:57:51,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=77800.0, ans=0.125 2024-08-09 15:57:59,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=77800.0, ans=0.125 2024-08-09 15:58:15,122 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2024-08-09 15:58:17,275 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+01 2.915e+01 3.303e+01 4.126e+01 7.711e+01, threshold=6.607e+01, percent-clipped=1.0 2024-08-09 15:58:17,301 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7800, loss[loss=0.138, beats_loss=0.008549, ecapa_loss=0.0004852, whisper_loss=0.1246, over 16579.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01341, ecapa_loss=0.0004429, whisper_loss=0.1049, over 3899307.58 frames. ], batch size: 64, lr: 3.75e-02, grad_scale: 256.0 2024-08-09 15:58:19,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=78000.0, ans=0.2 2024-08-09 15:58:20,231 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-09 15:58:20,765 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=12.87 vs. limit=10.0 2024-08-09 15:58:30,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=78100.0, ans=0.1 2024-08-09 15:58:53,099 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 15:59:01,264 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 15:59:02,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=78300.0, ans=0.0 2024-08-09 15:59:15,224 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=12.0 2024-08-09 15:59:19,512 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 15:59:25,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=78500.0, ans=0.125 2024-08-09 15:59:26,497 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7850, loss[loss=0.1486, beats_loss=0.01126, ecapa_loss=0.0005329, whisper_loss=0.132, over 22121.00 frames. ], tot_loss[loss=0.123, beats_loss=0.01337, ecapa_loss=0.0004401, whisper_loss=0.1052, over 3901820.26 frames. ], batch size: 89, lr: 3.74e-02, grad_scale: 256.0 2024-08-09 15:59:46,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=78600.0, ans=0.0 2024-08-09 15:59:49,669 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2024-08-09 15:59:58,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=78700.0, ans=0.2 2024-08-09 15:59:59,350 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.20 vs. limit=15.0 2024-08-09 16:00:02,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=78700.0, ans=0.035 2024-08-09 16:00:08,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=78800.0, ans=0.125 2024-08-09 16:00:20,582 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.15 vs. limit=12.0 2024-08-09 16:00:25,329 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 16:00:35,141 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 3.036e+01 3.521e+01 4.450e+01 7.582e+01, threshold=7.043e+01, percent-clipped=4.0 2024-08-09 16:00:35,163 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7900, loss[loss=0.1456, beats_loss=0.01166, ecapa_loss=0.0004341, whisper_loss=0.1296, over 22365.00 frames. ], tot_loss[loss=0.1225, beats_loss=0.01336, ecapa_loss=0.0004397, whisper_loss=0.1047, over 3879050.37 frames. ], batch size: 86, lr: 3.73e-02, grad_scale: 256.0 2024-08-09 16:00:47,063 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.34 vs. limit=22.5 2024-08-09 16:01:04,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=79200.0, ans=0.125 2024-08-09 16:01:15,769 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-08-09 16:01:16,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=79300.0, ans=0.125 2024-08-09 16:01:26,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=79300.0, ans=0.125 2024-08-09 16:01:37,828 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.38 vs. limit=22.5 2024-08-09 16:01:41,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=79400.0, ans=0.0 2024-08-09 16:01:43,801 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 7950, loss[loss=0.1045, beats_loss=0.01408, ecapa_loss=0.0004543, whisper_loss=0.08592, over 15871.00 frames. ], tot_loss[loss=0.1226, beats_loss=0.01343, ecapa_loss=0.0004366, whisper_loss=0.1048, over 3884413.58 frames. ], batch size: 65, lr: 3.73e-02, grad_scale: 256.0 2024-08-09 16:01:53,340 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-09 16:02:18,913 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-09 16:02:23,733 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.92 vs. limit=10.0 2024-08-09 16:02:26,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=79800.0, ans=0.0 2024-08-09 16:02:32,341 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.40 vs. limit=22.5 2024-08-09 16:02:54,515 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.344e+01 3.066e+01 3.561e+01 4.217e+01 9.530e+01, threshold=7.122e+01, percent-clipped=2.0 2024-08-09 16:02:54,535 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8000, loss[loss=0.1311, beats_loss=0.01239, ecapa_loss=0.0004915, whisper_loss=0.1138, over 15775.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01336, ecapa_loss=0.0004351, whisper_loss=0.105, over 3863989.10 frames. ], batch size: 63, lr: 3.72e-02, grad_scale: 512.0 2024-08-09 16:03:02,625 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-09 16:03:15,196 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 16 from Vox, 53 fro AS 2024-08-09 16:03:15,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2024-08-09 16:03:24,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=80200.0, ans=0.125 2024-08-09 16:03:25,964 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 32 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-09 16:03:32,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=80200.0, ans=0.125 2024-08-09 16:03:35,344 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-09 16:03:35,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=23.84 vs. limit=15.0 2024-08-09 16:03:45,887 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-09 16:03:54,050 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 40 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-09 16:04:01,671 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8050, loss[loss=0.1232, beats_loss=0.01437, ecapa_loss=0.0003731, whisper_loss=0.1051, over 18042.00 frames. ], tot_loss[loss=0.1228, beats_loss=0.01329, ecapa_loss=0.0004346, whisper_loss=0.1052, over 3850772.55 frames. ], batch size: 72, lr: 3.72e-02, grad_scale: 512.0 2024-08-09 16:04:12,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=80500.0, ans=0.0 2024-08-09 16:04:27,526 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.91 vs. limit=15.0 2024-08-09 16:04:41,887 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 16:04:47,465 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 34 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-09 16:04:53,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=80800.0, ans=0.125 2024-08-09 16:05:10,264 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 3.011e+01 3.515e+01 4.189e+01 8.391e+01, threshold=7.029e+01, percent-clipped=0.0 2024-08-09 16:05:10,285 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8100, loss[loss=0.1101, beats_loss=0.01292, ecapa_loss=0.000465, whisper_loss=0.0925, over 21742.00 frames. ], tot_loss[loss=0.1226, beats_loss=0.01327, ecapa_loss=0.0004336, whisper_loss=0.105, over 3896929.08 frames. ], batch size: 89, lr: 3.71e-02, grad_scale: 512.0 2024-08-09 16:05:17,129 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2024-08-09 16:05:42,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=81200.0, ans=0.0 2024-08-09 16:05:42,366 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.79 vs. limit=22.5 2024-08-09 16:05:48,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=81200.0, ans=0.125 2024-08-09 16:05:51,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.90 vs. limit=6.0 2024-08-09 16:05:52,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=81300.0, ans=0.0 2024-08-09 16:05:57,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=81300.0, ans=0.125 2024-08-09 16:06:01,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=81300.0, ans=0.125 2024-08-09 16:06:05,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=81400.0, ans=0.1 2024-08-09 16:06:11,916 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-09 16:06:18,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=81500.0, ans=0.125 2024-08-09 16:06:19,026 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8150, loss[loss=0.1048, beats_loss=0.01183, ecapa_loss=0.0005225, whisper_loss=0.08778, over 15515.00 frames. ], tot_loss[loss=0.123, beats_loss=0.01327, ecapa_loss=0.0004361, whisper_loss=0.1053, over 3894963.61 frames. ], batch size: 66, lr: 3.70e-02, grad_scale: 512.0 2024-08-09 16:06:25,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=81500.0, ans=0.0 2024-08-09 16:06:38,180 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.91 vs. limit=15.0 2024-08-09 16:06:48,774 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-09 16:07:01,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=81800.0, ans=0.0 2024-08-09 16:07:07,929 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 16:07:23,778 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-09 16:07:27,523 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 3.111e+01 3.553e+01 4.149e+01 8.297e+01, threshold=7.106e+01, percent-clipped=2.0 2024-08-09 16:07:27,545 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8200, loss[loss=0.1242, beats_loss=0.01263, ecapa_loss=0.0004874, whisper_loss=0.1067, over 21577.00 frames. ], tot_loss[loss=0.123, beats_loss=0.01328, ecapa_loss=0.0004363, whisper_loss=0.1054, over 3912500.52 frames. ], batch size: 89, lr: 3.70e-02, grad_scale: 512.0 2024-08-09 16:07:27,716 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 16:07:53,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=82200.0, ans=0.125 2024-08-09 16:08:02,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=82200.0, ans=0.95 2024-08-09 16:08:23,954 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-09 16:08:33,544 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-09 16:08:36,073 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8250, loss[loss=0.1259, beats_loss=0.01455, ecapa_loss=0.0004287, whisper_loss=0.1071, over 20167.00 frames. ], tot_loss[loss=0.1226, beats_loss=0.01336, ecapa_loss=0.0004362, whisper_loss=0.1049, over 3895003.74 frames. ], batch size: 82, lr: 3.69e-02, grad_scale: 512.0 2024-08-09 16:08:43,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.13 vs. limit=10.0 2024-08-09 16:08:47,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=82500.0, ans=0.2 2024-08-09 16:08:51,845 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.34 vs. limit=10.0 2024-08-09 16:09:29,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=82800.0, ans=0.1 2024-08-09 16:09:31,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=82900.0, ans=0.125 2024-08-09 16:09:44,587 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.998e+01 3.523e+01 3.969e+01 6.917e+01, threshold=7.045e+01, percent-clipped=0.0 2024-08-09 16:09:44,615 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8300, loss[loss=0.1028, beats_loss=0.01346, ecapa_loss=0.0003311, whisper_loss=0.08604, over 15017.00 frames. ], tot_loss[loss=0.1225, beats_loss=0.0134, ecapa_loss=0.0004295, whisper_loss=0.1048, over 3891819.59 frames. ], batch size: 56, lr: 3.68e-02, grad_scale: 512.0 2024-08-09 16:09:50,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=83000.0, ans=0.125 2024-08-09 16:09:51,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=83000.0, ans=0.125 2024-08-09 16:09:54,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=83000.0, ans=0.035 2024-08-09 16:10:05,518 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-09 16:10:20,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=83200.0, ans=0.2 2024-08-09 16:10:21,969 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-09 16:10:36,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=83300.0, ans=0.0 2024-08-09 16:10:57,670 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8350, loss[loss=0.1095, beats_loss=0.01321, ecapa_loss=0.0004621, whisper_loss=0.09167, over 21552.00 frames. ], tot_loss[loss=0.1219, beats_loss=0.01354, ecapa_loss=0.000429, whisper_loss=0.1041, over 3904672.99 frames. ], batch size: 89, lr: 3.68e-02, grad_scale: 512.0 2024-08-09 16:10:57,916 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 20 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-09 16:10:58,404 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2024-08-09 16:11:14,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=83600.0, ans=0.125 2024-08-09 16:11:29,034 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2024-08-09 16:11:35,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=83700.0, ans=0.015 2024-08-09 16:11:39,488 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 16:12:04,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=83900.0, ans=0.0 2024-08-09 16:12:08,696 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 3.077e+01 3.401e+01 4.133e+01 6.317e+01, threshold=6.802e+01, percent-clipped=0.0 2024-08-09 16:12:08,720 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8400, loss[loss=0.1028, beats_loss=0.01446, ecapa_loss=0.0005013, whisper_loss=0.08335, over 18439.00 frames. ], tot_loss[loss=0.1219, beats_loss=0.01355, ecapa_loss=0.0004288, whisper_loss=0.1041, over 3879973.44 frames. ], batch size: 78, lr: 3.67e-02, grad_scale: 512.0 2024-08-09 16:12:13,555 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 16:12:15,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=84000.0, ans=0.0 2024-08-09 16:12:35,314 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-09 16:12:47,546 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2024-08-09 16:13:17,488 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.79 vs. limit=15.0 2024-08-09 16:13:30,090 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8450, loss[loss=0.1238, beats_loss=0.01158, ecapa_loss=0.0004153, whisper_loss=0.108, over 19770.00 frames. ], tot_loss[loss=0.1232, beats_loss=0.01345, ecapa_loss=0.0004258, whisper_loss=0.1055, over 3876774.49 frames. ], batch size: 76, lr: 3.67e-02, grad_scale: 512.0 2024-08-09 16:13:41,839 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-09 16:13:49,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=84600.0, ans=0.125 2024-08-09 16:13:56,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=84600.0, ans=0.0 2024-08-09 16:13:58,063 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 11 from LS+wenet, 29 from Vox, 17 fro AS 2024-08-09 16:14:00,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=84700.0, ans=0.125 2024-08-09 16:14:04,451 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 16:14:06,928 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-09 16:14:07,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=84700.0, ans=0.05 2024-08-09 16:14:15,672 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-09 16:14:25,768 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 16:14:29,245 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-09 16:14:51,038 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.204e+01 2.954e+01 3.407e+01 4.304e+01 7.894e+01, threshold=6.814e+01, percent-clipped=2.0 2024-08-09 16:14:51,059 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8500, loss[loss=0.144, beats_loss=0.01222, ecapa_loss=0.0004137, whisper_loss=0.1277, over 22156.00 frames. ], tot_loss[loss=0.1235, beats_loss=0.01325, ecapa_loss=0.0004272, whisper_loss=0.106, over 3857888.28 frames. ], batch size: 84, lr: 3.66e-02, grad_scale: 512.0 2024-08-09 16:14:58,781 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.95 vs. limit=22.5 2024-08-09 16:15:05,546 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-09 16:15:19,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=85100.0, ans=0.0 2024-08-09 16:15:28,862 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-09 16:15:36,699 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 16:15:39,165 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=15.0 2024-08-09 16:16:02,166 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.60 vs. limit=15.0 2024-08-09 16:16:05,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=85300.0, ans=0.125 2024-08-09 16:16:13,458 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 16:16:13,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=85400.0, ans=0.05 2024-08-09 16:16:21,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=85400.0, ans=0.07 2024-08-09 16:16:27,000 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8550, loss[loss=0.1162, beats_loss=0.01762, ecapa_loss=0.0003985, whisper_loss=0.09455, over 21809.00 frames. ], tot_loss[loss=0.1235, beats_loss=0.01317, ecapa_loss=0.000426, whisper_loss=0.1061, over 3856936.97 frames. ], batch size: 90, lr: 3.65e-02, grad_scale: 512.0 2024-08-09 16:16:48,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=85600.0, ans=0.2 2024-08-09 16:16:54,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=85600.0, ans=0.125 2024-08-09 16:17:01,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=85600.0, ans=0.1 2024-08-09 16:17:12,592 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 16:17:12,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=85700.0, ans=0.0 2024-08-09 16:17:25,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=85800.0, ans=0.125 2024-08-09 16:17:25,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=85800.0, ans=0.07 2024-08-09 16:17:56,866 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.53 vs. limit=22.5 2024-08-09 16:18:03,683 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.231e+01 2.923e+01 3.374e+01 4.145e+01 6.398e+01, threshold=6.748e+01, percent-clipped=0.0 2024-08-09 16:18:03,703 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8600, loss[loss=0.1227, beats_loss=0.01216, ecapa_loss=0.0004856, whisper_loss=0.1057, over 19907.00 frames. ], tot_loss[loss=0.1237, beats_loss=0.01321, ecapa_loss=0.0004255, whisper_loss=0.1062, over 3856870.72 frames. ], batch size: 82, lr: 3.65e-02, grad_scale: 512.0 2024-08-09 16:18:33,981 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-09 16:18:59,396 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-09 16:19:14,736 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-09 16:19:26,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=86400.0, ans=0.04949747468305833 2024-08-09 16:19:26,979 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-09 16:19:40,833 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8650, loss[loss=0.1584, beats_loss=0.007332, ecapa_loss=0.0004754, whisper_loss=0.1463, over 22597.00 frames. ], tot_loss[loss=0.1231, beats_loss=0.01323, ecapa_loss=0.0004241, whisper_loss=0.1057, over 3855014.05 frames. ], batch size: 85, lr: 3.64e-02, grad_scale: 512.0 2024-08-09 16:19:46,061 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 16:19:48,704 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 16:20:08,563 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.02 vs. limit=10.0 2024-08-09 16:20:19,582 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 16:20:47,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=86900.0, ans=0.125 2024-08-09 16:20:47,710 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=12.0 2024-08-09 16:20:59,450 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.913e+01 3.504e+01 4.209e+01 7.626e+01, threshold=7.009e+01, percent-clipped=5.0 2024-08-09 16:20:59,477 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8700, loss[loss=0.1105, beats_loss=0.0136, ecapa_loss=0.0003248, whisper_loss=0.09363, over 17962.00 frames. ], tot_loss[loss=0.123, beats_loss=0.01337, ecapa_loss=0.0004233, whisper_loss=0.1054, over 3871951.74 frames. ], batch size: 66, lr: 3.64e-02, grad_scale: 512.0 2024-08-09 16:21:01,413 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-09 16:21:35,871 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-09 16:21:37,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=87200.0, ans=0.1 2024-08-09 16:21:48,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=87200.0, ans=0.0 2024-08-09 16:21:50,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=87200.0, ans=0.0 2024-08-09 16:21:53,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=87200.0, ans=0.0 2024-08-09 16:21:54,550 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 17 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-09 16:21:54,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=87200.0, ans=0.0 2024-08-09 16:21:58,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87300.0, ans=0.1 2024-08-09 16:22:15,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=87400.0, ans=0.1 2024-08-09 16:22:28,826 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8750, loss[loss=0.1257, beats_loss=0.01428, ecapa_loss=0.0003343, whisper_loss=0.1081, over 20126.00 frames. ], tot_loss[loss=0.1225, beats_loss=0.01322, ecapa_loss=0.0004292, whisper_loss=0.105, over 3842485.31 frames. ], batch size: 77, lr: 3.63e-02, grad_scale: 512.0 2024-08-09 16:22:49,236 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 16 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-09 16:22:56,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=87700.0, ans=0.0 2024-08-09 16:22:57,173 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-09 16:23:04,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=87700.0, ans=0.0 2024-08-09 16:23:11,103 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 16 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 16:23:17,506 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.35 vs. limit=15.0 2024-08-09 16:23:26,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=87900.0, ans=0.0 2024-08-09 16:23:39,377 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.881e+01 3.394e+01 4.029e+01 7.137e+01, threshold=6.788e+01, percent-clipped=1.0 2024-08-09 16:23:39,398 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8800, loss[loss=0.1547, beats_loss=0.01035, ecapa_loss=0.0004512, whisper_loss=0.1398, over 21574.00 frames. ], tot_loss[loss=0.1228, beats_loss=0.01326, ecapa_loss=0.0004261, whisper_loss=0.1053, over 3848653.25 frames. ], batch size: 81, lr: 3.62e-02, grad_scale: 512.0 2024-08-09 16:23:52,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=88100.0, ans=0.125 2024-08-09 16:23:57,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=88100.0, ans=0.125 2024-08-09 16:24:00,862 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-09 16:24:09,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=88200.0, ans=0.125 2024-08-09 16:24:35,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=88400.0, ans=0.125 2024-08-09 16:24:50,412 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-09 16:24:51,511 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8850, loss[loss=0.1321, beats_loss=0.01186, ecapa_loss=0.0003103, whisper_loss=0.1171, over 20521.00 frames. ], tot_loss[loss=0.1225, beats_loss=0.01335, ecapa_loss=0.0004197, whisper_loss=0.105, over 3852832.77 frames. ], batch size: 73, lr: 3.62e-02, grad_scale: 512.0 2024-08-09 16:24:52,331 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.49 vs. limit=22.5 2024-08-09 16:25:07,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=88600.0, ans=0.1 2024-08-09 16:25:07,676 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.13 vs. limit=15.0 2024-08-09 16:25:11,403 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 16:25:34,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=88800.0, ans=0.125 2024-08-09 16:25:51,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=88900.0, ans=0.125 2024-08-09 16:25:58,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=88900.0, ans=0.025 2024-08-09 16:26:01,933 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 2.897e+01 3.367e+01 4.055e+01 6.951e+01, threshold=6.734e+01, percent-clipped=1.0 2024-08-09 16:26:01,954 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8900, loss[loss=0.137, beats_loss=0.009908, ecapa_loss=0.0004738, whisper_loss=0.1223, over 23315.00 frames. ], tot_loss[loss=0.1236, beats_loss=0.0132, ecapa_loss=0.0004238, whisper_loss=0.1062, over 3847812.69 frames. ], batch size: 91, lr: 3.61e-02, grad_scale: 512.0 2024-08-09 16:26:05,789 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.86 vs. limit=6.0 2024-08-09 16:26:16,190 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-09 16:26:24,494 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 16:26:24,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=89100.0, ans=0.125 2024-08-09 16:26:33,033 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.619e-01 2024-08-09 16:26:42,451 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-09 16:26:42,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=89300.0, ans=0.125 2024-08-09 16:26:44,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=89300.0, ans=0.125 2024-08-09 16:26:51,037 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-09 16:27:01,150 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2024-08-09 16:27:07,086 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-09 16:27:10,665 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 8950, loss[loss=0.1143, beats_loss=0.01244, ecapa_loss=0.0004885, whisper_loss=0.09693, over 22659.00 frames. ], tot_loss[loss=0.1236, beats_loss=0.01318, ecapa_loss=0.000426, whisper_loss=0.1062, over 3873938.71 frames. ], batch size: 93, lr: 3.61e-02, grad_scale: 512.0 2024-08-09 16:27:16,864 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.37 vs. limit=15.0 2024-08-09 16:27:19,118 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-09 16:27:51,681 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 8 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-09 16:27:59,552 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.08 vs. limit=10.0 2024-08-09 16:28:02,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=89800.0, ans=0.125 2024-08-09 16:28:06,351 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-09 16:28:17,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=89900.0, ans=0.2 2024-08-09 16:28:20,148 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.962e+01 3.391e+01 3.948e+01 7.468e+01, threshold=6.781e+01, percent-clipped=1.0 2024-08-09 16:28:20,171 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9000, loss[loss=0.1028, beats_loss=0.01087, ecapa_loss=0.0005617, whisper_loss=0.08627, over 13844.00 frames. ], tot_loss[loss=0.1237, beats_loss=0.01313, ecapa_loss=0.0004276, whisper_loss=0.1063, over 3869549.37 frames. ], batch size: 56, lr: 3.60e-02, grad_scale: 512.0 2024-08-09 16:28:20,172 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-09 16:28:55,903 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.4483, 2.9262, 2.2812, 2.5303], device='cuda:2') 2024-08-09 16:29:00,120 INFO [train_multi_KD3.py:1149] (2/4) Epoch 1, validation on ASR_libri: loss=0.2932, beats_loss=0, ecapa_loss=0.001188, whisper_loss=0.2813, over 922467.00 frames. 2024-08-09 16:29:16,762 INFO [train_multi_KD3.py:1149] (2/4) Epoch 1, validation on SV_voxceleb1: loss=0.01105, beats_loss=0, ecapa_loss=0.001105, whisper_loss=0, over 939242.00 frames. 2024-08-09 16:30:49,185 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9506, 3.6482, 2.5462, 3.3949], device='cuda:2') 2024-08-09 16:31:15,748 INFO [train_multi_KD3.py:1149] (2/4) Epoch 1, validation on AT_audioset: loss=0.03209, beats_loss=0.03209, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 16:31:15,751 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-09 16:31:24,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=90000.0, ans=0.125 2024-08-09 16:31:26,009 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.31 vs. limit=15.0 2024-08-09 16:31:33,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=90100.0, ans=0.0 2024-08-09 16:31:38,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90100.0, ans=0.1 2024-08-09 16:31:49,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=90200.0, ans=0.125 2024-08-09 16:31:55,207 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.37 vs. limit=22.5 2024-08-09 16:32:01,468 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-09 16:32:21,780 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 36 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 16:32:23,193 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-09 16:32:24,203 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9050, loss[loss=0.1276, beats_loss=0.01022, ecapa_loss=0.0004873, whisper_loss=0.1126, over 19120.00 frames. ], tot_loss[loss=0.1237, beats_loss=0.01303, ecapa_loss=0.0004306, whisper_loss=0.1064, over 3851231.85 frames. ], batch size: 78, lr: 3.59e-02, grad_scale: 512.0 2024-08-09 16:32:24,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=90500.0, ans=0.025 2024-08-09 16:32:29,615 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.80 vs. limit=15.0 2024-08-09 16:32:38,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=90600.0, ans=0.0 2024-08-09 16:32:58,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=90700.0, ans=0.125 2024-08-09 16:32:59,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2024-08-09 16:33:02,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=90700.0, ans=0.125 2024-08-09 16:33:07,210 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-09 16:33:12,817 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-09 16:33:26,280 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-09 16:33:32,739 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.994e+01 3.542e+01 4.086e+01 6.210e+01, threshold=7.084e+01, percent-clipped=0.0 2024-08-09 16:33:32,759 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9100, loss[loss=0.1185, beats_loss=0.01368, ecapa_loss=0.0003924, whisper_loss=0.1009, over 18144.00 frames. ], tot_loss[loss=0.1234, beats_loss=0.01309, ecapa_loss=0.0004303, whisper_loss=0.106, over 3833176.91 frames. ], batch size: 71, lr: 3.59e-02, grad_scale: 512.0 2024-08-09 16:33:33,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=91000.0, ans=0.0 2024-08-09 16:34:01,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=91200.0, ans=0.125 2024-08-09 16:34:23,973 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.43 vs. limit=10.0 2024-08-09 16:34:37,013 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 16:34:41,044 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9150, loss[loss=0.1322, beats_loss=0.01341, ecapa_loss=0.000388, whisper_loss=0.1149, over 22115.00 frames. ], tot_loss[loss=0.1241, beats_loss=0.01299, ecapa_loss=0.0004275, whisper_loss=0.1068, over 3839157.76 frames. ], batch size: 86, lr: 3.58e-02, grad_scale: 512.0 2024-08-09 16:34:47,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=91500.0, ans=0.125 2024-08-09 16:34:51,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=91500.0, ans=0.025 2024-08-09 16:34:59,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91600.0, ans=0.1 2024-08-09 16:35:05,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=91600.0, ans=0.0 2024-08-09 16:35:14,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=91700.0, ans=0.1 2024-08-09 16:35:18,625 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.37 vs. limit=10.0 2024-08-09 16:35:40,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=91900.0, ans=10.0 2024-08-09 16:35:49,427 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.205e+01 2.827e+01 3.202e+01 3.925e+01 7.636e+01, threshold=6.404e+01, percent-clipped=0.0 2024-08-09 16:35:49,449 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9200, loss[loss=0.1163, beats_loss=0.01251, ecapa_loss=0.0004298, whisper_loss=0.09952, over 16429.00 frames. ], tot_loss[loss=0.1233, beats_loss=0.01302, ecapa_loss=0.0004257, whisper_loss=0.106, over 3836465.15 frames. ], batch size: 64, lr: 3.58e-02, grad_scale: 512.0 2024-08-09 16:35:49,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=92000.0, ans=0.035 2024-08-09 16:35:56,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=92000.0, ans=0.0 2024-08-09 16:36:02,406 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-09 16:36:21,974 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-09 16:36:26,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=92200.0, ans=0.125 2024-08-09 16:36:27,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=92200.0, ans=0.125 2024-08-09 16:36:34,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=92300.0, ans=0.1 2024-08-09 16:36:47,754 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-09 16:36:58,909 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9250, loss[loss=0.1217, beats_loss=0.01072, ecapa_loss=0.0003984, whisper_loss=0.107, over 18663.00 frames. ], tot_loss[loss=0.1232, beats_loss=0.013, ecapa_loss=0.0004252, whisper_loss=0.106, over 3861496.67 frames. ], batch size: 71, lr: 3.57e-02, grad_scale: 512.0 2024-08-09 16:36:59,201 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-09 16:37:04,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=92500.0, ans=0.125 2024-08-09 16:37:06,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=92500.0, ans=0.2 2024-08-09 16:37:22,670 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 16:37:23,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=92600.0, ans=0.1 2024-08-09 16:37:29,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=92700.0, ans=0.1 2024-08-09 16:37:42,436 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 16:38:03,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=92900.0, ans=0.025 2024-08-09 16:38:04,364 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 16:38:07,077 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.318e+01 3.067e+01 3.450e+01 4.093e+01 6.352e+01, threshold=6.900e+01, percent-clipped=1.0 2024-08-09 16:38:07,098 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9300, loss[loss=0.1428, beats_loss=0.01337, ecapa_loss=0.000359, whisper_loss=0.1258, over 22280.00 frames. ], tot_loss[loss=0.1222, beats_loss=0.0131, ecapa_loss=0.0004231, whisper_loss=0.1049, over 3898727.49 frames. ], batch size: 84, lr: 3.57e-02, grad_scale: 512.0 2024-08-09 16:38:10,199 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 16:38:11,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=93000.0, ans=0.125 2024-08-09 16:38:32,317 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 16:38:35,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=93200.0, ans=0.2 2024-08-09 16:38:43,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=93200.0, ans=0.0 2024-08-09 16:38:50,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=93300.0, ans=0.125 2024-08-09 16:38:53,539 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.55 vs. limit=15.0 2024-08-09 16:39:04,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=93400.0, ans=0.125 2024-08-09 16:39:06,035 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.59 vs. limit=22.5 2024-08-09 16:39:15,990 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9350, loss[loss=0.1045, beats_loss=0.01047, ecapa_loss=0.0004552, whisper_loss=0.08945, over 17238.00 frames. ], tot_loss[loss=0.122, beats_loss=0.01317, ecapa_loss=0.0004235, whisper_loss=0.1046, over 3860898.99 frames. ], batch size: 66, lr: 3.56e-02, grad_scale: 512.0 2024-08-09 16:39:20,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=93500.0, ans=0.125 2024-08-09 16:39:29,436 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2024-08-09 16:39:34,465 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-09 16:39:41,091 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-09 16:39:49,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=93700.0, ans=0.125 2024-08-09 16:39:54,459 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-09 16:40:14,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=93900.0, ans=0.125 2024-08-09 16:40:17,877 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 16:40:20,304 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-09 16:40:24,473 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 2.921e+01 3.226e+01 3.791e+01 1.210e+02, threshold=6.451e+01, percent-clipped=3.0 2024-08-09 16:40:24,494 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9400, loss[loss=0.1275, beats_loss=0.0145, ecapa_loss=0.0003814, whisper_loss=0.1092, over 22134.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.0133, ecapa_loss=0.0004203, whisper_loss=0.1039, over 3875025.35 frames. ], batch size: 85, lr: 3.55e-02, grad_scale: 512.0 2024-08-09 16:40:47,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=94100.0, ans=0.0 2024-08-09 16:40:49,558 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-09 16:41:00,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=94200.0, ans=0.2 2024-08-09 16:41:03,028 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-09 16:41:07,402 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 23 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-09 16:41:18,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=94400.0, ans=0.125 2024-08-09 16:41:24,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=94400.0, ans=0.1 2024-08-09 16:41:25,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=94400.0, ans=0.0 2024-08-09 16:41:30,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=94400.0, ans=0.125 2024-08-09 16:41:32,639 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9450, loss[loss=0.1095, beats_loss=0.01421, ecapa_loss=0.00038, whisper_loss=0.09152, over 22801.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01331, ecapa_loss=0.0004192, whisper_loss=0.1029, over 3871040.41 frames. ], batch size: 90, lr: 3.55e-02, grad_scale: 512.0 2024-08-09 16:41:35,014 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.74 vs. limit=15.0 2024-08-09 16:41:50,118 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=8.0 2024-08-09 16:41:59,908 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-09 16:42:02,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=94700.0, ans=0.125 2024-08-09 16:42:04,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=94700.0, ans=0.2 2024-08-09 16:42:12,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=94800.0, ans=0.125 2024-08-09 16:42:19,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=94800.0, ans=0.5 2024-08-09 16:42:19,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=94800.0, ans=0.125 2024-08-09 16:42:19,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=94800.0, ans=0.125 2024-08-09 16:42:20,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94800.0, ans=0.1 2024-08-09 16:42:21,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=94800.0, ans=0.2 2024-08-09 16:42:25,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=94900.0, ans=0.125 2024-08-09 16:42:30,208 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.86 vs. limit=22.5 2024-08-09 16:42:40,476 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 2.979e+01 3.573e+01 4.112e+01 7.498e+01, threshold=7.146e+01, percent-clipped=2.0 2024-08-09 16:42:40,497 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9500, loss[loss=0.1143, beats_loss=0.01319, ecapa_loss=0.0004731, whisper_loss=0.09641, over 21324.00 frames. ], tot_loss[loss=0.1197, beats_loss=0.01337, ecapa_loss=0.00042, whisper_loss=0.1021, over 3873526.29 frames. ], batch size: 91, lr: 3.54e-02, grad_scale: 512.0 2024-08-09 16:42:53,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=95100.0, ans=0.1 2024-08-09 16:42:56,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=95100.0, ans=0.125 2024-08-09 16:42:58,310 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 16:42:59,051 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.41 vs. limit=15.0 2024-08-09 16:43:03,846 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 11 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 16:43:21,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=95300.0, ans=0.125 2024-08-09 16:43:27,779 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.27 vs. limit=6.0 2024-08-09 16:43:33,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=95300.0, ans=0.2 2024-08-09 16:43:48,736 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9550, loss[loss=0.1216, beats_loss=0.0136, ecapa_loss=0.0004209, whisper_loss=0.1038, over 19148.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.01337, ecapa_loss=0.000421, whisper_loss=0.1026, over 3861481.47 frames. ], batch size: 79, lr: 3.54e-02, grad_scale: 512.0 2024-08-09 16:43:53,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=95500.0, ans=0.125 2024-08-09 16:43:59,414 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.64 vs. limit=10.0 2024-08-09 16:44:13,941 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-09 16:44:16,820 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-09 16:44:17,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=95700.0, ans=0.125 2024-08-09 16:44:27,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=95700.0, ans=0.0 2024-08-09 16:44:41,127 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-09 16:44:45,270 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 16:44:52,843 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-09 16:44:55,644 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 16:44:56,685 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 3.093e+01 3.544e+01 4.156e+01 7.056e+01, threshold=7.088e+01, percent-clipped=0.0 2024-08-09 16:44:56,710 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9600, loss[loss=0.1173, beats_loss=0.01344, ecapa_loss=0.0004012, whisper_loss=0.09986, over 19772.00 frames. ], tot_loss[loss=0.1205, beats_loss=0.01331, ecapa_loss=0.0004228, whisper_loss=0.1029, over 3865930.98 frames. ], batch size: 77, lr: 3.53e-02, grad_scale: 512.0 2024-08-09 16:44:57,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=96000.0, ans=0.125 2024-08-09 16:45:04,501 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 17 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-09 16:45:10,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=96100.0, ans=0.125 2024-08-09 16:45:18,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=96100.0, ans=0.0 2024-08-09 16:45:21,213 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 16:45:33,823 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.26 vs. limit=10.0 2024-08-09 16:45:37,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=96300.0, ans=0.125 2024-08-09 16:45:51,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=96400.0, ans=0.125 2024-08-09 16:46:04,511 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9650, loss[loss=0.1203, beats_loss=0.01274, ecapa_loss=0.0004827, whisper_loss=0.1027, over 17351.00 frames. ], tot_loss[loss=0.1193, beats_loss=0.01338, ecapa_loss=0.0004216, whisper_loss=0.1018, over 3827658.00 frames. ], batch size: 71, lr: 3.53e-02, grad_scale: 512.0 2024-08-09 16:46:26,912 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.48 vs. limit=15.0 2024-08-09 16:46:27,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=96600.0, ans=0.125 2024-08-09 16:46:39,670 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-09 16:46:47,704 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 15 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-09 16:46:49,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=96800.0, ans=0.0 2024-08-09 16:47:10,788 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.53 vs. limit=15.0 2024-08-09 16:47:11,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=97000.0, ans=0.0 2024-08-09 16:47:12,164 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.22 vs. limit=10.0 2024-08-09 16:47:12,611 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.968e+01 3.449e+01 4.387e+01 7.611e+01, threshold=6.898e+01, percent-clipped=2.0 2024-08-09 16:47:12,631 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9700, loss[loss=0.1003, beats_loss=0.01359, ecapa_loss=0.0003779, whisper_loss=0.08289, over 14482.00 frames. ], tot_loss[loss=0.1203, beats_loss=0.01344, ecapa_loss=0.000419, whisper_loss=0.1027, over 3852837.94 frames. ], batch size: 56, lr: 3.52e-02, grad_scale: 512.0 2024-08-09 16:47:52,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=97200.0, ans=0.1 2024-08-09 16:47:57,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=97300.0, ans=0.125 2024-08-09 16:47:58,929 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-09 16:48:03,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=97300.0, ans=0.125 2024-08-09 16:48:15,009 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.83 vs. limit=12.0 2024-08-09 16:48:15,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=97400.0, ans=0.2 2024-08-09 16:48:22,094 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9750, loss[loss=0.1241, beats_loss=0.01294, ecapa_loss=0.0004349, whisper_loss=0.1068, over 18103.00 frames. ], tot_loss[loss=0.1203, beats_loss=0.01332, ecapa_loss=0.0004154, whisper_loss=0.1028, over 3865284.15 frames. ], batch size: 71, lr: 3.51e-02, grad_scale: 512.0 2024-08-09 16:48:40,499 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 16:48:40,969 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.478e-01 2024-08-09 16:48:41,893 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 16:48:47,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=97600.0, ans=0.1 2024-08-09 16:49:09,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=97800.0, ans=0.2 2024-08-09 16:49:12,029 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 16:49:24,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=97900.0, ans=0.0 2024-08-09 16:49:24,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=97900.0, ans=0.125 2024-08-09 16:49:31,359 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+01 2.868e+01 3.333e+01 3.887e+01 7.337e+01, threshold=6.667e+01, percent-clipped=1.0 2024-08-09 16:49:31,379 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9800, loss[loss=0.09496, beats_loss=0.01319, ecapa_loss=0.0004194, whisper_loss=0.07758, over 16748.00 frames. ], tot_loss[loss=0.1212, beats_loss=0.01325, ecapa_loss=0.0004151, whisper_loss=0.1038, over 3863766.26 frames. ], batch size: 65, lr: 3.51e-02, grad_scale: 512.0 2024-08-09 16:49:31,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=98000.0, ans=0.1 2024-08-09 16:49:37,425 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.95 vs. limit=22.5 2024-08-09 16:49:44,629 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 16:50:12,043 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 16:50:28,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=98300.0, ans=0.0 2024-08-09 16:50:31,393 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-09 16:50:45,702 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 16:50:47,321 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9850, loss[loss=0.1123, beats_loss=0.01415, ecapa_loss=0.0003751, whisper_loss=0.09436, over 22290.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01319, ecapa_loss=0.0004171, whisper_loss=0.1043, over 3880165.73 frames. ], batch size: 89, lr: 3.50e-02, grad_scale: 512.0 2024-08-09 16:50:49,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=98500.0, ans=0.0 2024-08-09 16:51:01,481 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-09 16:51:03,224 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 16:51:06,219 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2024-08-09 16:51:43,614 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-09 16:51:43,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=98800.0, ans=0.125 2024-08-09 16:51:47,852 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-09 16:51:49,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=98800.0, ans=0.07 2024-08-09 16:51:54,356 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=15.0 2024-08-09 16:52:05,364 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-09 16:52:09,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=98900.0, ans=0.125 2024-08-09 16:52:11,598 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+01 2.990e+01 3.470e+01 4.121e+01 8.675e+01, threshold=6.939e+01, percent-clipped=3.0 2024-08-09 16:52:11,618 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9900, loss[loss=0.1455, beats_loss=0.009863, ecapa_loss=0.0004322, whisper_loss=0.1313, over 18798.00 frames. ], tot_loss[loss=0.1218, beats_loss=0.01326, ecapa_loss=0.0004151, whisper_loss=0.1044, over 3849173.79 frames. ], batch size: 74, lr: 3.50e-02, grad_scale: 512.0 2024-08-09 16:52:26,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=99000.0, ans=22.5 2024-08-09 16:52:30,559 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 16:52:34,895 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.51 vs. limit=22.5 2024-08-09 16:52:35,580 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-09 16:52:36,289 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.78 vs. limit=22.5 2024-08-09 16:52:40,699 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=15.0 2024-08-09 16:53:01,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=99200.0, ans=22.5 2024-08-09 16:53:05,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=99300.0, ans=0.125 2024-08-09 16:53:34,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=99500.0, ans=0.0 2024-08-09 16:53:35,716 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 9950, loss[loss=0.114, beats_loss=0.01255, ecapa_loss=0.0004582, whisper_loss=0.09689, over 17145.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01321, ecapa_loss=0.0004156, whisper_loss=0.1044, over 3857722.80 frames. ], batch size: 70, lr: 3.49e-02, grad_scale: 512.0 2024-08-09 16:53:39,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=99500.0, ans=0.1 2024-08-09 16:53:41,835 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 37 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-09 16:53:45,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=99500.0, ans=0.125 2024-08-09 16:54:07,755 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 27 from LS+wenet, 18 from Vox, 15 fro AS 2024-08-09 16:54:18,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=99700.0, ans=0.1 2024-08-09 16:54:31,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=99800.0, ans=0.125 2024-08-09 16:54:34,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=99800.0, ans=10.0 2024-08-09 16:54:37,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=99800.0, ans=0.125 2024-08-09 16:54:53,709 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.893e+01 3.392e+01 3.870e+01 8.367e+01, threshold=6.783e+01, percent-clipped=1.0 2024-08-09 16:54:53,732 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10000, loss[loss=0.1248, beats_loss=0.01237, ecapa_loss=0.0003848, whisper_loss=0.1086, over 21528.00 frames. ], tot_loss[loss=0.1223, beats_loss=0.01315, ecapa_loss=0.0004162, whisper_loss=0.105, over 3864070.13 frames. ], batch size: 85, lr: 3.49e-02, grad_scale: 1024.0 2024-08-09 16:55:26,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=100200.0, ans=0.0 2024-08-09 16:55:35,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=100300.0, ans=0.0 2024-08-09 16:55:39,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=100300.0, ans=0.2 2024-08-09 16:55:51,463 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-09 16:56:03,776 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10050, loss[loss=0.1224, beats_loss=0.01391, ecapa_loss=0.0003559, whisper_loss=0.1049, over 18086.00 frames. ], tot_loss[loss=0.1221, beats_loss=0.01316, ecapa_loss=0.0004127, whisper_loss=0.1048, over 3877078.86 frames. ], batch size: 68, lr: 3.48e-02, grad_scale: 1024.0 2024-08-09 16:56:11,240 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 22 from LS+wenet, 29 from Vox, 44 fro AS 2024-08-09 16:56:20,898 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 16:56:25,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=100600.0, ans=0.125 2024-08-09 16:56:30,716 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-09 16:56:32,039 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-09 16:56:41,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=100700.0, ans=0.1 2024-08-09 16:56:48,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=100800.0, ans=0.0 2024-08-09 16:56:48,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=100800.0, ans=0.125 2024-08-09 16:56:55,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=100800.0, ans=0.015 2024-08-09 16:57:12,749 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 2.921e+01 3.378e+01 4.111e+01 6.632e+01, threshold=6.756e+01, percent-clipped=0.0 2024-08-09 16:57:12,770 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10100, loss[loss=0.1233, beats_loss=0.01496, ecapa_loss=0.0003424, whisper_loss=0.105, over 22926.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01324, ecapa_loss=0.0004106, whisper_loss=0.1043, over 3890044.30 frames. ], batch size: 90, lr: 3.47e-02, grad_scale: 1024.0 2024-08-09 16:57:25,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=101100.0, ans=0.0 2024-08-09 16:57:29,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=101100.0, ans=0.0 2024-08-09 16:57:32,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2024-08-09 16:57:36,205 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.42 vs. limit=15.0 2024-08-09 16:57:38,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=101200.0, ans=0.0 2024-08-09 16:57:41,948 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=22.5 2024-08-09 16:57:43,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=101200.0, ans=0.0 2024-08-09 16:57:45,886 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-09 16:57:46,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=101200.0, ans=0.125 2024-08-09 16:57:50,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=101200.0, ans=0.0 2024-08-09 16:57:59,719 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2024-08-09 16:58:00,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=101300.0, ans=0.2 2024-08-09 16:58:10,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=101400.0, ans=0.125 2024-08-09 16:58:14,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=101400.0, ans=0.2 2024-08-09 16:58:20,642 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10150, loss[loss=0.1167, beats_loss=0.01476, ecapa_loss=0.0003617, whisper_loss=0.09837, over 22571.00 frames. ], tot_loss[loss=0.1216, beats_loss=0.01327, ecapa_loss=0.0004121, whisper_loss=0.1042, over 3911300.51 frames. ], batch size: 94, lr: 3.47e-02, grad_scale: 1024.0 2024-08-09 16:58:35,939 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 16:58:53,185 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-09 16:58:54,497 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 16:59:02,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=101800.0, ans=0.0 2024-08-09 16:59:03,954 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 21 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-09 16:59:14,250 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.64 vs. limit=22.5 2024-08-09 16:59:17,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=101900.0, ans=0.125 2024-08-09 16:59:21,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=101900.0, ans=0.0 2024-08-09 16:59:29,987 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.923e+01 3.411e+01 4.089e+01 6.898e+01, threshold=6.822e+01, percent-clipped=2.0 2024-08-09 16:59:30,009 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10200, loss[loss=0.1065, beats_loss=0.01394, ecapa_loss=0.0004159, whisper_loss=0.08845, over 16453.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01316, ecapa_loss=0.00041, whisper_loss=0.1044, over 3900556.78 frames. ], batch size: 65, lr: 3.46e-02, grad_scale: 1024.0 2024-08-09 16:59:39,097 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.99 vs. limit=15.0 2024-08-09 17:00:03,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=102200.0, ans=0.95 2024-08-09 17:00:14,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=102300.0, ans=0.0 2024-08-09 17:00:15,611 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 17:00:27,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=102400.0, ans=0.0 2024-08-09 17:00:38,835 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10250, loss[loss=0.1141, beats_loss=0.01503, ecapa_loss=0.0003676, whisper_loss=0.09536, over 23257.00 frames. ], tot_loss[loss=0.1222, beats_loss=0.01317, ecapa_loss=0.0004089, whisper_loss=0.1049, over 3930758.11 frames. ], batch size: 93, lr: 3.46e-02, grad_scale: 1024.0 2024-08-09 17:00:56,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=102600.0, ans=22.5 2024-08-09 17:01:00,798 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 17:01:24,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102800.0, ans=0.1 2024-08-09 17:01:29,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=102800.0, ans=0.1 2024-08-09 17:01:39,490 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 17:01:46,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=103000.0, ans=0.125 2024-08-09 17:01:47,222 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.244e+01 2.938e+01 3.467e+01 4.292e+01 7.706e+01, threshold=6.934e+01, percent-clipped=1.0 2024-08-09 17:01:47,242 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10300, loss[loss=0.1233, beats_loss=0.01355, ecapa_loss=0.0004347, whisper_loss=0.1055, over 22794.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01322, ecapa_loss=0.0004068, whisper_loss=0.1042, over 3905791.09 frames. ], batch size: 91, lr: 3.45e-02, grad_scale: 1024.0 2024-08-09 17:01:49,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=103000.0, ans=0.125 2024-08-09 17:02:01,370 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-09 17:02:05,297 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-09 17:02:16,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=103200.0, ans=0.125 2024-08-09 17:02:29,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=103300.0, ans=0.1 2024-08-09 17:02:36,067 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-09 17:02:42,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=103400.0, ans=0.0 2024-08-09 17:02:46,873 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 17:02:54,679 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10350, loss[loss=0.116, beats_loss=0.01343, ecapa_loss=0.0004571, whisper_loss=0.09804, over 20488.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01328, ecapa_loss=0.0004088, whisper_loss=0.104, over 3922855.52 frames. ], batch size: 86, lr: 3.45e-02, grad_scale: 1024.0 2024-08-09 17:02:56,206 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 20 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-09 17:02:59,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=103500.0, ans=0.0 2024-08-09 17:03:13,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=103600.0, ans=0.125 2024-08-09 17:03:18,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.43 vs. limit=15.0 2024-08-09 17:03:21,301 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2024-08-09 17:03:33,754 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.62 vs. limit=10.0 2024-08-09 17:03:34,477 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-09 17:03:37,302 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 17:03:37,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=103800.0, ans=0.0 2024-08-09 17:03:44,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=103800.0, ans=0.1 2024-08-09 17:04:02,866 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 3.016e+01 3.413e+01 4.405e+01 7.924e+01, threshold=6.827e+01, percent-clipped=1.0 2024-08-09 17:04:02,886 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10400, loss[loss=0.1108, beats_loss=0.01159, ecapa_loss=0.0003848, whisper_loss=0.09532, over 18944.00 frames. ], tot_loss[loss=0.1216, beats_loss=0.01332, ecapa_loss=0.000407, whisper_loss=0.1043, over 3913700.52 frames. ], batch size: 75, lr: 3.44e-02, grad_scale: 1024.0 2024-08-09 17:04:03,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=104000.0, ans=0.125 2024-08-09 17:04:13,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=104000.0, ans=0.2 2024-08-09 17:04:19,277 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.76 vs. limit=22.5 2024-08-09 17:04:19,911 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-09 17:04:24,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=104100.0, ans=15.0 2024-08-09 17:04:44,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=104300.0, ans=0.125 2024-08-09 17:04:51,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=104300.0, ans=0.125 2024-08-09 17:04:57,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=104400.0, ans=0.05 2024-08-09 17:04:58,744 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-09 17:05:12,211 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10450, loss[loss=0.09438, beats_loss=0.015, ecapa_loss=0.0003789, whisper_loss=0.07559, over 14298.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01323, ecapa_loss=0.0004061, whisper_loss=0.1041, over 3872401.02 frames. ], batch size: 59, lr: 3.44e-02, grad_scale: 1024.0 2024-08-09 17:05:14,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=104500.0, ans=0.05 2024-08-09 17:05:23,761 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-09 17:05:26,356 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-09 17:05:27,909 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 30 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-09 17:05:29,288 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-09 17:05:34,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=104600.0, ans=0.125 2024-08-09 17:05:55,294 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 10 from Vox, 37 fro AS 2024-08-09 17:05:58,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=104800.0, ans=0.2 2024-08-09 17:06:04,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=104800.0, ans=0.0 2024-08-09 17:06:15,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=104900.0, ans=0.0 2024-08-09 17:06:17,624 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.54 vs. limit=15.0 2024-08-09 17:06:22,483 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.355e+01 3.012e+01 3.451e+01 3.999e+01 6.423e+01, threshold=6.903e+01, percent-clipped=0.0 2024-08-09 17:06:22,504 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10500, loss[loss=0.1428, beats_loss=0.01094, ecapa_loss=0.0004379, whisper_loss=0.1275, over 22947.00 frames. ], tot_loss[loss=0.1216, beats_loss=0.01328, ecapa_loss=0.0004056, whisper_loss=0.1043, over 3881111.59 frames. ], batch size: 91, lr: 3.43e-02, grad_scale: 1024.0 2024-08-09 17:06:23,196 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.41 vs. limit=15.0 2024-08-09 17:06:23,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=105000.0, ans=15.0 2024-08-09 17:06:39,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=105100.0, ans=0.2 2024-08-09 17:06:44,848 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-09 17:06:45,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=105100.0, ans=0.125 2024-08-09 17:06:54,004 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2024-08-09 17:07:03,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=105300.0, ans=0.0 2024-08-09 17:07:03,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=105300.0, ans=0.0 2024-08-09 17:07:10,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=105300.0, ans=0.125 2024-08-09 17:07:13,471 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.32 vs. limit=15.0 2024-08-09 17:07:17,052 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-09 17:07:27,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=105400.0, ans=0.015 2024-08-09 17:07:28,003 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-09 17:07:32,201 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10550, loss[loss=0.1276, beats_loss=0.01205, ecapa_loss=0.0003652, whisper_loss=0.1119, over 23835.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01321, ecapa_loss=0.0004068, whisper_loss=0.1044, over 3871053.92 frames. ], batch size: 94, lr: 3.43e-02, grad_scale: 1024.0 2024-08-09 17:07:41,605 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.88 vs. limit=10.0 2024-08-09 17:07:50,176 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 14 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 17:07:57,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=105600.0, ans=0.1 2024-08-09 17:08:13,463 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.19 vs. limit=15.0 2024-08-09 17:08:17,006 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-09 17:08:23,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=105800.0, ans=0.0 2024-08-09 17:08:30,619 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 25 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-09 17:08:33,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.66 vs. limit=10.0 2024-08-09 17:08:41,496 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.989e+01 3.482e+01 4.095e+01 9.318e+01, threshold=6.964e+01, percent-clipped=2.0 2024-08-09 17:08:41,516 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10600, loss[loss=0.1075, beats_loss=0.01304, ecapa_loss=0.0004071, whisper_loss=0.0904, over 19137.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01325, ecapa_loss=0.0004088, whisper_loss=0.1044, over 3887334.65 frames. ], batch size: 80, lr: 3.42e-02, grad_scale: 1024.0 2024-08-09 17:08:47,068 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-09 17:08:51,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=106000.0, ans=0.0 2024-08-09 17:09:18,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=106200.0, ans=0.1 2024-08-09 17:09:22,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=106300.0, ans=0.0 2024-08-09 17:09:26,557 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 17:09:32,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=106300.0, ans=0.125 2024-08-09 17:09:41,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=106400.0, ans=0.0 2024-08-09 17:09:51,618 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10650, loss[loss=0.1369, beats_loss=0.01083, ecapa_loss=0.0004185, whisper_loss=0.1219, over 15806.00 frames. ], tot_loss[loss=0.1215, beats_loss=0.01326, ecapa_loss=0.0004057, whisper_loss=0.1042, over 3860894.98 frames. ], batch size: 63, lr: 3.41e-02, grad_scale: 1024.0 2024-08-09 17:09:52,402 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.10 vs. limit=15.0 2024-08-09 17:09:53,468 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 17:09:59,352 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2024-08-09 17:10:17,159 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 17:10:19,918 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 37 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-09 17:10:28,231 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 14 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 17:10:30,830 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-09 17:10:35,265 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 17:10:44,988 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 34 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-09 17:11:01,583 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 3.109e+01 3.454e+01 4.119e+01 5.374e+01, threshold=6.908e+01, percent-clipped=0.0 2024-08-09 17:11:01,603 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10700, loss[loss=0.1005, beats_loss=0.01592, ecapa_loss=0.0002916, whisper_loss=0.08165, over 16033.00 frames. ], tot_loss[loss=0.121, beats_loss=0.01335, ecapa_loss=0.0004045, whisper_loss=0.1036, over 3856465.32 frames. ], batch size: 61, lr: 3.41e-02, grad_scale: 1024.0 2024-08-09 17:11:41,141 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=12.0 2024-08-09 17:11:48,296 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2024-08-09 17:12:00,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=107400.0, ans=0.1 2024-08-09 17:12:10,250 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10750, loss[loss=0.1479, beats_loss=0.009298, ecapa_loss=0.0004174, whisper_loss=0.1344, over 20163.00 frames. ], tot_loss[loss=0.1216, beats_loss=0.01324, ecapa_loss=0.0004026, whisper_loss=0.1044, over 3870934.12 frames. ], batch size: 78, lr: 3.40e-02, grad_scale: 1024.0 2024-08-09 17:12:16,002 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-09 17:12:21,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=107500.0, ans=0.0 2024-08-09 17:12:40,389 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=23.22 vs. limit=15.0 2024-08-09 17:13:02,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=107800.0, ans=0.0 2024-08-09 17:13:03,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=107900.0, ans=0.0 2024-08-09 17:13:06,244 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 17:13:10,964 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.84 vs. limit=12.0 2024-08-09 17:13:18,079 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.180e+01 2.960e+01 3.558e+01 4.572e+01 9.073e+01, threshold=7.116e+01, percent-clipped=3.0 2024-08-09 17:13:18,099 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10800, loss[loss=0.1128, beats_loss=0.01495, ecapa_loss=0.0004276, whisper_loss=0.0936, over 20750.00 frames. ], tot_loss[loss=0.121, beats_loss=0.01324, ecapa_loss=0.0004033, whisper_loss=0.1037, over 3852831.39 frames. ], batch size: 86, lr: 3.40e-02, grad_scale: 1024.0 2024-08-09 17:13:34,910 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-09 17:13:42,988 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-09 17:13:54,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=108200.0, ans=0.125 2024-08-09 17:13:56,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=108200.0, ans=0.2 2024-08-09 17:13:59,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=108300.0, ans=0.1 2024-08-09 17:14:18,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=108400.0, ans=0.125 2024-08-09 17:14:21,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=108400.0, ans=0.04949747468305833 2024-08-09 17:14:26,244 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10850, loss[loss=0.1277, beats_loss=0.01407, ecapa_loss=0.0003164, whisper_loss=0.1104, over 23181.00 frames. ], tot_loss[loss=0.1207, beats_loss=0.01327, ecapa_loss=0.0003998, whisper_loss=0.1034, over 3854717.96 frames. ], batch size: 88, lr: 3.39e-02, grad_scale: 1024.0 2024-08-09 17:14:26,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=108500.0, ans=0.2 2024-08-09 17:14:29,324 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 17:14:32,369 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-09 17:14:36,875 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 29 from Vox, 25 fro AS 2024-08-09 17:14:38,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=108500.0, ans=0.05 2024-08-09 17:15:09,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=108800.0, ans=0.1 2024-08-09 17:15:11,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=108800.0, ans=0.125 2024-08-09 17:15:35,483 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 3.154e+01 3.497e+01 4.138e+01 7.474e+01, threshold=6.993e+01, percent-clipped=1.0 2024-08-09 17:15:35,503 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10900, loss[loss=0.1193, beats_loss=0.01339, ecapa_loss=0.0003856, whisper_loss=0.102, over 21493.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01326, ecapa_loss=0.0003992, whisper_loss=0.1035, over 3868391.56 frames. ], batch size: 90, lr: 3.39e-02, grad_scale: 1024.0 2024-08-09 17:15:37,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=109000.0, ans=0.125 2024-08-09 17:15:55,524 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.129e+00 2024-08-09 17:16:10,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=109200.0, ans=0.05 2024-08-09 17:16:34,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=109400.0, ans=0.95 2024-08-09 17:16:35,478 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-09 17:16:43,652 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 10950, loss[loss=0.1126, beats_loss=0.01426, ecapa_loss=0.0003553, whisper_loss=0.09478, over 21087.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01323, ecapa_loss=0.0003978, whisper_loss=0.1042, over 3894900.07 frames. ], batch size: 84, lr: 3.38e-02, grad_scale: 1024.0 2024-08-09 17:16:46,635 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 17:16:50,781 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 17:16:52,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=109500.0, ans=0.07 2024-08-09 17:16:55,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=109500.0, ans=0.2 2024-08-09 17:17:04,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=109600.0, ans=0.125 2024-08-09 17:17:15,401 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 17:17:51,532 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.947e+01 3.240e+01 3.931e+01 5.659e+01, threshold=6.481e+01, percent-clipped=0.0 2024-08-09 17:17:51,552 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11000, loss[loss=0.102, beats_loss=0.01496, ecapa_loss=0.0003212, whisper_loss=0.08385, over 22138.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01312, ecapa_loss=0.0003999, whisper_loss=0.1046, over 3916158.56 frames. ], batch size: 91, lr: 3.38e-02, grad_scale: 1024.0 2024-08-09 17:17:55,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=110000.0, ans=0.0 2024-08-09 17:18:19,259 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.78 vs. limit=15.0 2024-08-09 17:18:20,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=110200.0, ans=0.025 2024-08-09 17:18:30,844 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.13 vs. limit=15.0 2024-08-09 17:18:40,923 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2024-08-09 17:18:46,221 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.55 vs. limit=6.0 2024-08-09 17:18:51,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=110400.0, ans=0.1 2024-08-09 17:19:00,731 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11050, loss[loss=0.1167, beats_loss=0.01285, ecapa_loss=0.0004856, whisper_loss=0.09896, over 19855.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01308, ecapa_loss=0.0003987, whisper_loss=0.1043, over 3923043.07 frames. ], batch size: 89, lr: 3.37e-02, grad_scale: 1024.0 2024-08-09 17:19:18,723 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-09 17:19:32,198 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.66 vs. limit=8.0 2024-08-09 17:19:32,724 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-09 17:19:33,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=110700.0, ans=0.0 2024-08-09 17:20:04,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=110900.0, ans=0.125 2024-08-09 17:20:09,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=111000.0, ans=0.1 2024-08-09 17:20:10,338 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.413e+01 3.030e+01 3.567e+01 4.272e+01 6.137e+01, threshold=7.134e+01, percent-clipped=0.0 2024-08-09 17:20:10,358 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11100, loss[loss=0.1082, beats_loss=0.01398, ecapa_loss=0.0004016, whisper_loss=0.09025, over 19640.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01306, ecapa_loss=0.0004, whisper_loss=0.1043, over 3930928.54 frames. ], batch size: 82, lr: 3.37e-02, grad_scale: 1024.0 2024-08-09 17:20:34,068 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 17:20:35,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=111100.0, ans=0.125 2024-08-09 17:20:38,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=111200.0, ans=0.125 2024-08-09 17:20:57,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=111300.0, ans=0.125 2024-08-09 17:21:07,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=111400.0, ans=0.2 2024-08-09 17:21:10,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=111400.0, ans=0.1 2024-08-09 17:21:10,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=111400.0, ans=0.125 2024-08-09 17:21:15,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=111400.0, ans=0.07 2024-08-09 17:21:18,950 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.97 vs. limit=15.0 2024-08-09 17:21:19,225 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11150, loss[loss=0.1113, beats_loss=0.01514, ecapa_loss=0.0004227, whisper_loss=0.09196, over 22703.00 frames. ], tot_loss[loss=0.1211, beats_loss=0.0131, ecapa_loss=0.0003958, whisper_loss=0.104, over 3912455.82 frames. ], batch size: 94, lr: 3.36e-02, grad_scale: 1024.0 2024-08-09 17:21:21,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=111500.0, ans=0.025 2024-08-09 17:21:22,203 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-09 17:21:23,546 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 35 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 17:21:36,580 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.85 vs. limit=22.5 2024-08-09 17:21:45,707 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-09 17:21:47,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=111700.0, ans=0.1 2024-08-09 17:21:53,803 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.33 vs. limit=22.5 2024-08-09 17:21:55,539 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 17:21:57,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=111700.0, ans=0.125 2024-08-09 17:21:57,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=111700.0, ans=0.125 2024-08-09 17:22:16,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=111900.0, ans=0.07 2024-08-09 17:22:28,499 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.940e+01 3.532e+01 4.042e+01 6.455e+01, threshold=7.065e+01, percent-clipped=0.0 2024-08-09 17:22:28,519 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11200, loss[loss=0.1444, beats_loss=0.01059, ecapa_loss=0.0004563, whisper_loss=0.1293, over 23331.00 frames. ], tot_loss[loss=0.1216, beats_loss=0.01302, ecapa_loss=0.0003982, whisper_loss=0.1046, over 3932951.57 frames. ], batch size: 92, lr: 3.36e-02, grad_scale: 1024.0 2024-08-09 17:22:30,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=112000.0, ans=0.2 2024-08-09 17:22:33,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=112000.0, ans=0.125 2024-08-09 17:22:41,242 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-09 17:22:44,377 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.970e-01 2024-08-09 17:22:54,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=112100.0, ans=0.125 2024-08-09 17:23:05,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=112200.0, ans=0.125 2024-08-09 17:23:09,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=112300.0, ans=0.125 2024-08-09 17:23:14,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=112300.0, ans=0.125 2024-08-09 17:23:25,064 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2024-08-09 17:23:25,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=112400.0, ans=0.0 2024-08-09 17:23:27,468 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.94 vs. limit=22.5 2024-08-09 17:23:37,742 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11250, loss[loss=0.1462, beats_loss=0.01082, ecapa_loss=0.0004527, whisper_loss=0.1309, over 22622.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01303, ecapa_loss=0.0003961, whisper_loss=0.1047, over 3917618.43 frames. ], batch size: 92, lr: 3.35e-02, grad_scale: 1024.0 2024-08-09 17:23:42,049 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-09 17:23:49,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=112500.0, ans=0.0 2024-08-09 17:24:10,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112700.0, ans=0.1 2024-08-09 17:24:33,765 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 17:24:47,207 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.986e+01 3.509e+01 4.225e+01 7.875e+01, threshold=7.019e+01, percent-clipped=1.0 2024-08-09 17:24:47,227 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11300, loss[loss=0.1227, beats_loss=0.01038, ecapa_loss=0.0004039, whisper_loss=0.1083, over 15446.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01304, ecapa_loss=0.0003944, whisper_loss=0.1045, over 3911250.56 frames. ], batch size: 59, lr: 3.35e-02, grad_scale: 1024.0 2024-08-09 17:24:56,235 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-09 17:25:50,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=113400.0, ans=0.125 2024-08-09 17:25:56,704 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11350, loss[loss=0.1355, beats_loss=0.01049, ecapa_loss=0.0004192, whisper_loss=0.1208, over 20161.00 frames. ], tot_loss[loss=0.1215, beats_loss=0.01307, ecapa_loss=0.0003898, whisper_loss=0.1045, over 3937099.51 frames. ], batch size: 79, lr: 3.34e-02, grad_scale: 1024.0 2024-08-09 17:26:03,315 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.34 vs. limit=22.5 2024-08-09 17:26:08,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=113500.0, ans=0.125 2024-08-09 17:26:09,821 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2024-08-09 17:26:10,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=113600.0, ans=0.125 2024-08-09 17:26:32,236 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.08 vs. limit=22.5 2024-08-09 17:26:37,376 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-09 17:26:45,700 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-09 17:26:52,648 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-09 17:26:53,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=113900.0, ans=0.2 2024-08-09 17:27:04,126 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2024-08-09 17:27:06,456 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 2.900e+01 3.368e+01 4.036e+01 6.013e+01, threshold=6.736e+01, percent-clipped=0.0 2024-08-09 17:27:06,477 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11400, loss[loss=0.1244, beats_loss=0.01409, ecapa_loss=0.0002976, whisper_loss=0.1073, over 14314.00 frames. ], tot_loss[loss=0.1215, beats_loss=0.01305, ecapa_loss=0.0003895, whisper_loss=0.1046, over 3945078.84 frames. ], batch size: 54, lr: 3.34e-02, grad_scale: 1024.0 2024-08-09 17:27:07,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=114000.0, ans=0.05 2024-08-09 17:27:07,585 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2024-08-09 17:27:14,041 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-08-09 17:27:59,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=114300.0, ans=0.125 2024-08-09 17:28:01,637 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 25 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-09 17:28:15,885 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11450, loss[loss=0.102, beats_loss=0.01562, ecapa_loss=0.0003631, whisper_loss=0.08271, over 14126.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01312, ecapa_loss=0.0003876, whisper_loss=0.1044, over 3916782.69 frames. ], batch size: 55, lr: 3.33e-02, grad_scale: 1024.0 2024-08-09 17:28:39,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=114600.0, ans=0.5 2024-08-09 17:29:15,743 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 17:29:22,171 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2024-08-09 17:29:26,995 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+01 3.054e+01 3.515e+01 4.307e+01 8.084e+01, threshold=7.029e+01, percent-clipped=1.0 2024-08-09 17:29:27,021 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11500, loss[loss=0.09822, beats_loss=0.01855, ecapa_loss=0.000304, whisper_loss=0.07663, over 23517.00 frames. ], tot_loss[loss=0.1218, beats_loss=0.01302, ecapa_loss=0.0003885, whisper_loss=0.1049, over 3948744.03 frames. ], batch size: 94, lr: 3.33e-02, grad_scale: 1024.0 2024-08-09 17:29:29,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=115000.0, ans=0.1 2024-08-09 17:29:41,418 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.24 vs. limit=15.0 2024-08-09 17:30:06,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=115200.0, ans=15.0 2024-08-09 17:30:10,950 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.75 vs. limit=22.5 2024-08-09 17:30:13,567 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.11 vs. limit=22.5 2024-08-09 17:30:23,731 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2024-08-09 17:30:29,934 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 15 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 17:30:34,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=115400.0, ans=0.2 2024-08-09 17:30:39,401 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.12 vs. limit=12.0 2024-08-09 17:30:41,364 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11550, loss[loss=0.1161, beats_loss=0.01045, ecapa_loss=0.0004222, whisper_loss=0.1014, over 15073.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01304, ecapa_loss=0.0003881, whisper_loss=0.1048, over 3937048.80 frames. ], batch size: 60, lr: 3.32e-02, grad_scale: 1024.0 2024-08-09 17:30:45,948 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 17:30:51,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=115500.0, ans=0.125 2024-08-09 17:30:54,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=115600.0, ans=0.5 2024-08-09 17:30:59,588 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-09 17:31:06,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=115600.0, ans=0.2 2024-08-09 17:31:17,057 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 17:31:23,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=115800.0, ans=0.125 2024-08-09 17:31:38,302 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.59 vs. limit=10.0 2024-08-09 17:31:45,104 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 17:31:53,535 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 2.957e+01 3.409e+01 3.917e+01 8.485e+01, threshold=6.817e+01, percent-clipped=1.0 2024-08-09 17:31:53,555 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11600, loss[loss=0.09506, beats_loss=0.01657, ecapa_loss=0.0004083, whisper_loss=0.0744, over 20758.00 frames. ], tot_loss[loss=0.1225, beats_loss=0.01295, ecapa_loss=0.0003917, whisper_loss=0.1056, over 3926004.71 frames. ], batch size: 89, lr: 3.32e-02, grad_scale: 1024.0 2024-08-09 17:32:12,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=116100.0, ans=0.0 2024-08-09 17:32:13,359 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 17:32:15,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=116100.0, ans=0.0 2024-08-09 17:32:23,984 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 18 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-09 17:32:40,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=116300.0, ans=0.125 2024-08-09 17:32:44,667 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 18 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 17:32:46,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=116300.0, ans=0.1 2024-08-09 17:32:51,080 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-09 17:32:58,329 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-09 17:32:58,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=116400.0, ans=0.2 2024-08-09 17:32:59,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=116400.0, ans=0.95 2024-08-09 17:33:04,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=116400.0, ans=0.0 2024-08-09 17:33:07,015 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11650, loss[loss=0.1272, beats_loss=0.01115, ecapa_loss=0.0004026, whisper_loss=0.1121, over 17817.00 frames. ], tot_loss[loss=0.1219, beats_loss=0.01296, ecapa_loss=0.0003948, whisper_loss=0.1049, over 3926451.52 frames. ], batch size: 68, lr: 3.31e-02, grad_scale: 1024.0 2024-08-09 17:33:10,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=116500.0, ans=0.125 2024-08-09 17:33:10,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=116500.0, ans=0.125 2024-08-09 17:33:14,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=116500.0, ans=0.125 2024-08-09 17:33:17,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=116500.0, ans=0.0 2024-08-09 17:33:25,052 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-09 17:33:26,369 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-09 17:33:31,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=116600.0, ans=0.0 2024-08-09 17:34:00,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=116800.0, ans=0.2 2024-08-09 17:34:18,569 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.244e+01 3.106e+01 3.561e+01 4.217e+01 8.775e+01, threshold=7.122e+01, percent-clipped=2.0 2024-08-09 17:34:18,590 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11700, loss[loss=0.1119, beats_loss=0.01368, ecapa_loss=0.0003914, whisper_loss=0.09428, over 21313.00 frames. ], tot_loss[loss=0.1218, beats_loss=0.01298, ecapa_loss=0.0003945, whisper_loss=0.1048, over 3927503.16 frames. ], batch size: 90, lr: 3.31e-02, grad_scale: 1024.0 2024-08-09 17:34:27,097 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 17:34:55,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=117200.0, ans=0.125 2024-08-09 17:34:58,738 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 38 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-09 17:35:00,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=117200.0, ans=0.2 2024-08-09 17:35:16,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=117400.0, ans=0.125 2024-08-09 17:35:30,740 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11750, loss[loss=0.1046, beats_loss=0.0135, ecapa_loss=0.0004074, whisper_loss=0.087, over 19659.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01301, ecapa_loss=0.0003908, whisper_loss=0.1045, over 3959109.64 frames. ], batch size: 81, lr: 3.30e-02, grad_scale: 1024.0 2024-08-09 17:35:32,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=117500.0, ans=0.0 2024-08-09 17:35:56,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=117600.0, ans=0.125 2024-08-09 17:35:57,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=117700.0, ans=0.125 2024-08-09 17:36:30,979 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 17:36:34,490 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=12.0 2024-08-09 17:36:37,750 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-09 17:36:39,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=118000.0, ans=0.125 2024-08-09 17:36:40,372 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.943e+01 3.344e+01 4.022e+01 9.659e+01, threshold=6.689e+01, percent-clipped=1.0 2024-08-09 17:36:40,392 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11800, loss[loss=0.1121, beats_loss=0.01496, ecapa_loss=0.0003813, whisper_loss=0.09334, over 22918.00 frames. ], tot_loss[loss=0.1215, beats_loss=0.01302, ecapa_loss=0.000389, whisper_loss=0.1046, over 3935007.98 frames. ], batch size: 93, lr: 3.30e-02, grad_scale: 1024.0 2024-08-09 17:37:00,558 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2024-08-09 17:37:01,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=118100.0, ans=0.0 2024-08-09 17:37:12,006 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.38 vs. limit=15.0 2024-08-09 17:37:15,563 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-09 17:37:34,659 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 34 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-09 17:37:39,551 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.22 vs. limit=15.0 2024-08-09 17:37:43,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=118400.0, ans=0.125 2024-08-09 17:37:46,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=118400.0, ans=0.125 2024-08-09 17:37:46,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=118400.0, ans=0.125 2024-08-09 17:37:46,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=118400.0, ans=0.1 2024-08-09 17:37:49,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2024-08-09 17:37:51,477 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11850, loss[loss=0.1286, beats_loss=0.01176, ecapa_loss=0.0004696, whisper_loss=0.1121, over 21495.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01307, ecapa_loss=0.0003923, whisper_loss=0.1038, over 3936961.04 frames. ], batch size: 87, lr: 3.29e-02, grad_scale: 1024.0 2024-08-09 17:37:54,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=118500.0, ans=0.125 2024-08-09 17:38:03,021 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=12.0 2024-08-09 17:38:04,487 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.43 vs. limit=22.5 2024-08-09 17:38:05,198 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-09 17:38:42,837 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2024-08-09 17:38:47,135 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=12.0 2024-08-09 17:38:49,337 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 15 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-09 17:38:59,149 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2024-08-09 17:39:03,661 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.477e+01 2.937e+01 3.452e+01 4.190e+01 6.711e+01, threshold=6.904e+01, percent-clipped=1.0 2024-08-09 17:39:03,681 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11900, loss[loss=0.1325, beats_loss=0.01312, ecapa_loss=0.0004336, whisper_loss=0.1151, over 22601.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01314, ecapa_loss=0.0003909, whisper_loss=0.1034, over 3921682.04 frames. ], batch size: 93, lr: 3.29e-02, grad_scale: 1024.0 2024-08-09 17:39:24,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=119100.0, ans=0.09899494936611666 2024-08-09 17:39:28,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=119100.0, ans=0.125 2024-08-09 17:39:45,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=119200.0, ans=0.125 2024-08-09 17:40:02,149 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2024-08-09 17:40:17,602 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 11950, loss[loss=0.1331, beats_loss=0.009257, ecapa_loss=0.0005075, whisper_loss=0.1188, over 18684.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01308, ecapa_loss=0.0003927, whisper_loss=0.1038, over 3931666.08 frames. ], batch size: 79, lr: 3.28e-02, grad_scale: 1024.0 2024-08-09 17:40:28,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=119500.0, ans=0.1 2024-08-09 17:40:29,577 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-09 17:40:30,182 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2024-08-09 17:40:41,155 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.597e+00 2024-08-09 17:40:53,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=119700.0, ans=0.0 2024-08-09 17:41:08,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=119800.0, ans=0.0 2024-08-09 17:41:12,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=119800.0, ans=0.1 2024-08-09 17:41:30,289 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.97 vs. limit=22.5 2024-08-09 17:41:31,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=119900.0, ans=0.1 2024-08-09 17:41:32,092 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.48 vs. limit=6.0 2024-08-09 17:41:36,913 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+01 2.930e+01 3.462e+01 4.384e+01 7.473e+01, threshold=6.925e+01, percent-clipped=1.0 2024-08-09 17:41:36,934 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12000, loss[loss=0.1178, beats_loss=0.01489, ecapa_loss=0.00037, whisper_loss=0.09924, over 21814.00 frames. ], tot_loss[loss=0.121, beats_loss=0.01309, ecapa_loss=0.0003903, whisper_loss=0.104, over 3927775.03 frames. ], batch size: 86, lr: 3.28e-02, grad_scale: 2048.0 2024-08-09 17:41:36,934 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-09 17:42:24,852 INFO [train_multi_KD3.py:1149] (2/4) Epoch 1, validation on ASR_libri: loss=0.2866, beats_loss=0, ecapa_loss=0.00111, whisper_loss=0.2755, over 922467.00 frames. 2024-08-09 17:42:44,934 INFO [train_multi_KD3.py:1149] (2/4) Epoch 1, validation on SV_voxceleb1: loss=0.01049, beats_loss=0, ecapa_loss=0.001049, whisper_loss=0, over 939242.00 frames. 2024-08-09 17:43:06,599 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.9432, 1.5795, 2.0170, 1.7316], device='cuda:2') 2024-08-09 17:44:38,329 INFO [train_multi_KD3.py:1149] (2/4) Epoch 1, validation on AT_audioset: loss=0.03131, beats_loss=0.03131, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 17:44:38,332 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-09 17:44:38,947 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.25 vs. limit=22.5 2024-08-09 17:44:40,564 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.26 vs. limit=15.0 2024-08-09 17:44:48,803 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=5.00 vs. limit=15.0 2024-08-09 17:44:59,897 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2024-08-09 17:45:04,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=120100.0, ans=0.125 2024-08-09 17:45:09,115 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-09 17:45:42,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=120400.0, ans=15.0 2024-08-09 17:45:46,584 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-09 17:45:54,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=120400.0, ans=0.05 2024-08-09 17:45:57,304 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12050, loss[loss=0.1472, beats_loss=0.009267, ecapa_loss=0.0003867, whisper_loss=0.1341, over 14069.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01315, ecapa_loss=0.0003875, whisper_loss=0.1038, over 3889143.61 frames. ], batch size: 53, lr: 3.27e-02, grad_scale: 2048.0 2024-08-09 17:45:59,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=120500.0, ans=0.0 2024-08-09 17:46:09,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=120500.0, ans=0.125 2024-08-09 17:46:26,393 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2024-08-09 17:46:38,469 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-09 17:46:44,950 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 17:46:49,178 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-09 17:46:50,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=120800.0, ans=0.1 2024-08-09 17:46:52,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=120800.0, ans=0.025 2024-08-09 17:47:01,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=120900.0, ans=0.1 2024-08-09 17:47:08,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=120900.0, ans=0.1 2024-08-09 17:47:11,395 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.398e+01 2024-08-09 17:47:12,200 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+01 2.990e+01 3.554e+01 4.139e+01 7.218e+01, threshold=7.107e+01, percent-clipped=1.0 2024-08-09 17:47:12,225 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12100, loss[loss=0.1046, beats_loss=0.01206, ecapa_loss=0.0002894, whisper_loss=0.08964, over 15358.00 frames. ], tot_loss[loss=0.1203, beats_loss=0.0131, ecapa_loss=0.0003898, whisper_loss=0.1033, over 3873351.69 frames. ], batch size: 56, lr: 3.27e-02, grad_scale: 2048.0 2024-08-09 17:47:22,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=121000.0, ans=0.0 2024-08-09 17:47:23,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=121000.0, ans=0.0 2024-08-09 17:47:30,247 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.28 vs. limit=22.5 2024-08-09 17:47:37,798 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 17:47:40,701 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 19 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-09 17:47:47,844 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.76 vs. limit=6.0 2024-08-09 17:47:51,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=121200.0, ans=0.125 2024-08-09 17:48:24,760 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-09 17:48:29,402 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12150, loss[loss=0.09516, beats_loss=0.01011, ecapa_loss=0.000428, whisper_loss=0.08077, over 13566.00 frames. ], tot_loss[loss=0.12, beats_loss=0.01308, ecapa_loss=0.0003878, whisper_loss=0.1031, over 3855413.76 frames. ], batch size: 54, lr: 3.26e-02, grad_scale: 2048.0 2024-08-09 17:48:34,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=121500.0, ans=0.125 2024-08-09 17:48:35,617 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-09 17:48:48,102 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-09 17:48:55,493 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-09 17:49:11,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=121700.0, ans=0.0 2024-08-09 17:49:16,464 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 17:49:24,743 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.15 vs. limit=10.0 2024-08-09 17:49:25,593 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-09 17:49:35,466 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 37 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-09 17:49:45,549 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=20.11 vs. limit=15.0 2024-08-09 17:49:45,991 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.869e+01 3.277e+01 4.136e+01 6.270e+01, threshold=6.555e+01, percent-clipped=0.0 2024-08-09 17:49:46,027 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12200, loss[loss=0.1376, beats_loss=0.01243, ecapa_loss=0.0003233, whisper_loss=0.1219, over 16440.00 frames. ], tot_loss[loss=0.1202, beats_loss=0.01307, ecapa_loss=0.000387, whisper_loss=0.1033, over 3858922.60 frames. ], batch size: 61, lr: 3.26e-02, grad_scale: 2048.0 2024-08-09 17:49:47,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=122000.0, ans=0.125 2024-08-09 17:49:59,178 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=15.0 2024-08-09 17:50:51,805 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-09 17:50:54,384 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-09 17:51:00,274 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-09 17:51:01,661 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12250, loss[loss=0.1344, beats_loss=0.01402, ecapa_loss=0.0003453, whisper_loss=0.1169, over 22731.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.01308, ecapa_loss=0.0003848, whisper_loss=0.1032, over 3845990.96 frames. ], batch size: 92, lr: 3.25e-02, grad_scale: 2048.0 2024-08-09 17:51:31,566 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 17:51:39,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=122700.0, ans=0.125 2024-08-09 17:51:41,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=122700.0, ans=0.125 2024-08-09 17:51:44,787 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-09 17:51:52,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=122800.0, ans=0.125 2024-08-09 17:51:54,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=122800.0, ans=0.0 2024-08-09 17:52:08,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=122900.0, ans=0.035 2024-08-09 17:52:17,131 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.334e+01 2.887e+01 3.272e+01 4.030e+01 7.099e+01, threshold=6.544e+01, percent-clipped=1.0 2024-08-09 17:52:17,154 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12300, loss[loss=0.1112, beats_loss=0.01161, ecapa_loss=0.0004042, whisper_loss=0.09558, over 14086.00 frames. ], tot_loss[loss=0.1199, beats_loss=0.01316, ecapa_loss=0.000383, whisper_loss=0.1029, over 3830585.51 frames. ], batch size: 57, lr: 3.25e-02, grad_scale: 2048.0 2024-08-09 17:52:27,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=123000.0, ans=0.0 2024-08-09 17:52:33,601 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 25 from LS+wenet, 12 from Vox, 18 fro AS 2024-08-09 17:52:52,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=123200.0, ans=0.0 2024-08-09 17:53:05,752 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-09 17:53:07,142 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 23 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-09 17:53:08,492 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-09 17:53:31,800 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12350, loss[loss=0.1021, beats_loss=0.01537, ecapa_loss=0.0004181, whisper_loss=0.08256, over 15741.00 frames. ], tot_loss[loss=0.12, beats_loss=0.0131, ecapa_loss=0.0003874, whisper_loss=0.103, over 3841635.59 frames. ], batch size: 66, lr: 3.24e-02, grad_scale: 2048.0 2024-08-09 17:53:34,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=123500.0, ans=0.5 2024-08-09 17:53:36,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=123500.0, ans=0.125 2024-08-09 17:54:05,384 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.54 vs. limit=10.0 2024-08-09 17:54:12,946 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2024-08-09 17:54:13,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=123700.0, ans=0.1 2024-08-09 17:54:18,489 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 17:54:44,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=123900.0, ans=0.125 2024-08-09 17:54:45,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=123900.0, ans=0.125 2024-08-09 17:54:48,381 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 3.013e+01 3.404e+01 4.023e+01 7.879e+01, threshold=6.808e+01, percent-clipped=3.0 2024-08-09 17:54:48,401 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12400, loss[loss=0.1268, beats_loss=0.01205, ecapa_loss=0.0004261, whisper_loss=0.1105, over 20160.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01305, ecapa_loss=0.0003861, whisper_loss=0.1039, over 3853411.26 frames. ], batch size: 85, lr: 3.24e-02, grad_scale: 2048.0 2024-08-09 17:55:22,544 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 17:55:24,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=124200.0, ans=0.125 2024-08-09 17:55:27,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=124200.0, ans=0.2 2024-08-09 17:55:35,457 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 32 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 17:55:38,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=124300.0, ans=0.1 2024-08-09 17:56:00,990 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12450, loss[loss=0.1108, beats_loss=0.01302, ecapa_loss=0.0005435, whisper_loss=0.09236, over 18333.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01309, ecapa_loss=0.0003854, whisper_loss=0.1034, over 3859283.24 frames. ], batch size: 80, lr: 3.23e-02, grad_scale: 2048.0 2024-08-09 17:56:03,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=124500.0, ans=0.125 2024-08-09 17:56:06,230 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-09 17:56:10,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=124500.0, ans=0.125 2024-08-09 17:56:19,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=124600.0, ans=0.0 2024-08-09 17:56:25,061 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-09 17:56:35,653 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 17:56:36,574 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2024-08-09 17:56:49,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=124800.0, ans=0.125 2024-08-09 17:57:03,479 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 32 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-09 17:57:05,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=124900.0, ans=0.2 2024-08-09 17:57:14,286 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.381e+01 2.994e+01 3.498e+01 4.030e+01 6.153e+01, threshold=6.996e+01, percent-clipped=0.0 2024-08-09 17:57:14,307 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12500, loss[loss=0.1553, beats_loss=0.007504, ecapa_loss=0.0004103, whisper_loss=0.1437, over 16169.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.01307, ecapa_loss=0.000384, whisper_loss=0.1032, over 3856531.51 frames. ], batch size: 62, lr: 3.23e-02, grad_scale: 2048.0 2024-08-09 17:57:21,795 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.82 vs. limit=10.0 2024-08-09 17:57:23,769 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.57 vs. limit=22.5 2024-08-09 17:57:56,528 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.68 vs. limit=10.0 2024-08-09 17:58:27,940 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 17:58:28,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=125500.0, ans=0.07 2024-08-09 17:58:28,852 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12550, loss[loss=0.1194, beats_loss=0.0142, ecapa_loss=0.0003723, whisper_loss=0.1015, over 20603.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01307, ecapa_loss=0.0003832, whisper_loss=0.1035, over 3886524.04 frames. ], batch size: 84, lr: 3.22e-02, grad_scale: 2048.0 2024-08-09 17:58:51,002 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 40 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-09 17:58:52,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=125600.0, ans=0.125 2024-08-09 17:58:57,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=125600.0, ans=0.125 2024-08-09 17:59:11,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=125700.0, ans=0.2 2024-08-09 17:59:12,348 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.29 vs. limit=15.0 2024-08-09 17:59:15,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=125800.0, ans=0.0 2024-08-09 17:59:18,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=125800.0, ans=0.125 2024-08-09 17:59:30,987 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.86 vs. limit=22.5 2024-08-09 17:59:42,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=126000.0, ans=0.125 2024-08-09 17:59:43,271 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.307e+01 3.067e+01 3.520e+01 4.433e+01 6.633e+01, threshold=7.039e+01, percent-clipped=0.0 2024-08-09 17:59:43,307 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12600, loss[loss=0.1316, beats_loss=0.01273, ecapa_loss=0.0005157, whisper_loss=0.1137, over 21680.00 frames. ], tot_loss[loss=0.1212, beats_loss=0.0131, ecapa_loss=0.0003871, whisper_loss=0.1043, over 3901141.27 frames. ], batch size: 93, lr: 3.22e-02, grad_scale: 2048.0 2024-08-09 17:59:49,468 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-09 17:59:52,694 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.23 vs. limit=15.0 2024-08-09 18:00:16,780 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 22 from LS+wenet, 22 from Vox, 51 fro AS 2024-08-09 18:00:34,832 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-09 18:00:36,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=126300.0, ans=0.125 2024-08-09 18:00:39,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=126300.0, ans=0.1 2024-08-09 18:00:40,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=126400.0, ans=0.0 2024-08-09 18:00:41,643 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 18:00:50,330 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 18:00:55,654 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12650, loss[loss=0.1423, beats_loss=0.01018, ecapa_loss=0.0003553, whisper_loss=0.1285, over 22581.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01317, ecapa_loss=0.0003873, whisper_loss=0.1034, over 3908772.99 frames. ], batch size: 85, lr: 3.21e-02, grad_scale: 2048.0 2024-08-09 18:01:14,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=126600.0, ans=0.1 2024-08-09 18:01:24,104 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.50 vs. limit=22.5 2024-08-09 18:01:48,829 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 33 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-09 18:01:50,301 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-09 18:01:55,159 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2024-08-09 18:02:04,065 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-09 18:02:08,588 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.091e+01 2.930e+01 3.194e+01 3.853e+01 8.153e+01, threshold=6.388e+01, percent-clipped=1.0 2024-08-09 18:02:08,608 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12700, loss[loss=0.1183, beats_loss=0.01438, ecapa_loss=0.0003984, whisper_loss=0.0999, over 19659.00 frames. ], tot_loss[loss=0.1207, beats_loss=0.01309, ecapa_loss=0.0003863, whisper_loss=0.1037, over 3920017.06 frames. ], batch size: 80, lr: 3.21e-02, grad_scale: 2048.0 2024-08-09 18:02:34,546 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=12.0 2024-08-09 18:02:37,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=127200.0, ans=0.05 2024-08-09 18:02:38,056 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 20 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-09 18:02:57,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=127300.0, ans=0.125 2024-08-09 18:03:01,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=127300.0, ans=0.1 2024-08-09 18:03:01,821 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.17 vs. limit=15.0 2024-08-09 18:03:09,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=127400.0, ans=0.125 2024-08-09 18:03:22,557 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12750, loss[loss=0.1411, beats_loss=0.01309, ecapa_loss=0.0004194, whisper_loss=0.1239, over 21831.00 frames. ], tot_loss[loss=0.1199, beats_loss=0.01322, ecapa_loss=0.0003851, whisper_loss=0.1028, over 3908847.32 frames. ], batch size: 88, lr: 3.20e-02, grad_scale: 2048.0 2024-08-09 18:03:46,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=127600.0, ans=0.125 2024-08-09 18:03:54,698 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 18:04:13,606 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.87 vs. limit=22.5 2024-08-09 18:04:26,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=127900.0, ans=0.0 2024-08-09 18:04:33,348 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.155e+01 3.049e+01 3.510e+01 3.985e+01 5.812e+01, threshold=7.020e+01, percent-clipped=0.0 2024-08-09 18:04:33,369 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12800, loss[loss=0.1162, beats_loss=0.01525, ecapa_loss=0.0003263, whisper_loss=0.09769, over 20299.00 frames. ], tot_loss[loss=0.1197, beats_loss=0.01332, ecapa_loss=0.0003851, whisper_loss=0.1025, over 3932383.25 frames. ], batch size: 80, lr: 3.20e-02, grad_scale: 2048.0 2024-08-09 18:04:36,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=128000.0, ans=0.125 2024-08-09 18:04:58,225 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-09 18:05:01,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=128200.0, ans=0.1 2024-08-09 18:05:08,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=128200.0, ans=0.125 2024-08-09 18:05:27,895 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 16 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 18:05:44,005 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12850, loss[loss=0.1153, beats_loss=0.01498, ecapa_loss=0.0003486, whisper_loss=0.09682, over 19519.00 frames. ], tot_loss[loss=0.1192, beats_loss=0.01338, ecapa_loss=0.0003824, whisper_loss=0.102, over 3902715.30 frames. ], batch size: 78, lr: 3.19e-02, grad_scale: 2048.0 2024-08-09 18:05:50,337 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.70 vs. limit=8.0 2024-08-09 18:05:52,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=128500.0, ans=0.0 2024-08-09 18:05:53,533 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-09 18:06:02,794 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.68 vs. limit=22.5 2024-08-09 18:06:03,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=128600.0, ans=0.125 2024-08-09 18:06:22,532 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-09 18:06:35,269 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.55 vs. limit=15.0 2024-08-09 18:06:39,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=128800.0, ans=0.125 2024-08-09 18:06:39,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=128800.0, ans=0.0 2024-08-09 18:06:43,340 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-09 18:06:52,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=128900.0, ans=0.0 2024-08-09 18:06:57,078 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.723e+01 3.295e+01 4.012e+01 6.106e+01, threshold=6.589e+01, percent-clipped=0.0 2024-08-09 18:06:57,113 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12900, loss[loss=0.1073, beats_loss=0.01359, ecapa_loss=0.0003282, whisper_loss=0.09041, over 16394.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01336, ecapa_loss=0.0003812, whisper_loss=0.1016, over 3885488.89 frames. ], batch size: 64, lr: 3.19e-02, grad_scale: 2048.0 2024-08-09 18:07:28,553 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 27 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-09 18:07:30,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=129200.0, ans=0.125 2024-08-09 18:07:53,255 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-09 18:08:08,637 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 12950, loss[loss=0.1319, beats_loss=0.01331, ecapa_loss=0.000293, whisper_loss=0.1156, over 16106.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01331, ecapa_loss=0.0003814, whisper_loss=0.1017, over 3884058.12 frames. ], batch size: 60, lr: 3.19e-02, grad_scale: 2048.0 2024-08-09 18:08:13,052 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 18:08:27,671 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-08-09 18:08:30,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=129600.0, ans=0.0 2024-08-09 18:08:32,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=129600.0, ans=0.125 2024-08-09 18:08:47,812 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 34 from Vox, 33 fro AS 2024-08-09 18:08:50,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=129800.0, ans=0.125 2024-08-09 18:09:01,667 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-09 18:09:10,368 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-09 18:09:24,374 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.210e+01 3.005e+01 3.464e+01 3.958e+01 5.866e+01, threshold=6.929e+01, percent-clipped=0.0 2024-08-09 18:09:24,398 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13000, loss[loss=0.1294, beats_loss=0.01224, ecapa_loss=0.0004506, whisper_loss=0.1127, over 21680.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01328, ecapa_loss=0.0003831, whisper_loss=0.1011, over 3881415.77 frames. ], batch size: 92, lr: 3.18e-02, grad_scale: 2048.0 2024-08-09 18:09:25,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=130000.0, ans=0.125 2024-08-09 18:09:53,079 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-09 18:09:54,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=130200.0, ans=0.0 2024-08-09 18:10:06,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=130300.0, ans=0.0 2024-08-09 18:10:06,759 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.26 vs. limit=6.0 2024-08-09 18:10:10,532 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.07 vs. limit=15.0 2024-08-09 18:10:11,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=130300.0, ans=0.025 2024-08-09 18:10:14,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=130300.0, ans=0.07 2024-08-09 18:10:16,735 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-09 18:10:38,070 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13050, loss[loss=0.138, beats_loss=0.01078, ecapa_loss=0.0004131, whisper_loss=0.1231, over 22481.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01315, ecapa_loss=0.0003822, whisper_loss=0.1016, over 3856737.17 frames. ], batch size: 92, lr: 3.18e-02, grad_scale: 2048.0 2024-08-09 18:10:42,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=130500.0, ans=0.0 2024-08-09 18:10:50,303 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-09 18:10:59,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=130600.0, ans=0.125 2024-08-09 18:11:04,409 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-09 18:11:14,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=130700.0, ans=0.035 2024-08-09 18:11:17,604 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-09 18:11:20,540 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2024-08-09 18:11:33,963 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-09 18:11:39,622 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 18:11:42,335 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 18:11:55,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=130900.0, ans=0.125 2024-08-09 18:12:08,436 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.885e+01 3.590e+01 4.189e+01 8.103e+01, threshold=7.179e+01, percent-clipped=1.0 2024-08-09 18:12:08,457 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13100, loss[loss=0.1228, beats_loss=0.0143, ecapa_loss=0.0003057, whisper_loss=0.1055, over 22569.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01318, ecapa_loss=0.0003794, whisper_loss=0.1017, over 3843973.22 frames. ], batch size: 88, lr: 3.17e-02, grad_scale: 2048.0 2024-08-09 18:12:10,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=131000.0, ans=0.125 2024-08-09 18:12:21,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131100.0, ans=0.1 2024-08-09 18:12:29,092 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.17 vs. limit=10.0 2024-08-09 18:13:41,052 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13150, loss[loss=0.1064, beats_loss=0.01236, ecapa_loss=0.0003476, whisper_loss=0.0906, over 14395.00 frames. ], tot_loss[loss=0.1194, beats_loss=0.01303, ecapa_loss=0.0003818, whisper_loss=0.1026, over 3856773.87 frames. ], batch size: 55, lr: 3.17e-02, grad_scale: 2048.0 2024-08-09 18:13:42,439 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-09 18:14:06,348 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 40 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-09 18:14:06,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=131600.0, ans=0.0 2024-08-09 18:14:47,733 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-09 18:15:07,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=131900.0, ans=0.2 2024-08-09 18:15:13,373 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 18:15:14,256 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.35 vs. limit=10.0 2024-08-09 18:15:15,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=131900.0, ans=0.025 2024-08-09 18:15:23,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=131900.0, ans=0.1 2024-08-09 18:15:31,472 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.963e+01 3.357e+01 4.080e+01 6.559e+01, threshold=6.714e+01, percent-clipped=0.0 2024-08-09 18:15:31,504 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13200, loss[loss=0.1324, beats_loss=0.01118, ecapa_loss=0.0004011, whisper_loss=0.1172, over 13274.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01311, ecapa_loss=0.0003828, whisper_loss=0.1021, over 3849053.47 frames. ], batch size: 53, lr: 3.16e-02, grad_scale: 2048.0 2024-08-09 18:15:37,226 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.75 vs. limit=15.0 2024-08-09 18:15:46,734 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 18:15:54,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=132100.0, ans=0.0 2024-08-09 18:15:58,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=132100.0, ans=0.125 2024-08-09 18:15:58,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=132100.0, ans=0.0 2024-08-09 18:16:11,463 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.35 vs. limit=22.5 2024-08-09 18:16:13,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=132200.0, ans=0.125 2024-08-09 18:16:33,183 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=12.0 2024-08-09 18:16:35,303 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.49 vs. limit=6.0 2024-08-09 18:16:37,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=132300.0, ans=0.1 2024-08-09 18:16:46,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=132300.0, ans=0.125 2024-08-09 18:17:10,938 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-09 18:17:16,249 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13250, loss[loss=0.1017, beats_loss=0.01328, ecapa_loss=0.0003194, whisper_loss=0.08525, over 15766.00 frames. ], tot_loss[loss=0.1196, beats_loss=0.01295, ecapa_loss=0.0003884, whisper_loss=0.1027, over 3861143.69 frames. ], batch size: 61, lr: 3.16e-02, grad_scale: 2048.0 2024-08-09 18:17:28,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=132500.0, ans=0.0 2024-08-09 18:17:43,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=132600.0, ans=0.125 2024-08-09 18:17:45,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=12.0 2024-08-09 18:17:57,930 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-09 18:18:07,420 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.801e-02 2024-08-09 18:18:18,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=132800.0, ans=0.125 2024-08-09 18:18:29,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=132900.0, ans=0.1 2024-08-09 18:18:40,091 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.492e+01 2.970e+01 3.375e+01 4.348e+01 9.574e+01, threshold=6.749e+01, percent-clipped=3.0 2024-08-09 18:18:40,124 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13300, loss[loss=0.1481, beats_loss=0.01316, ecapa_loss=0.0002787, whisper_loss=0.1322, over 16613.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01303, ecapa_loss=0.0003852, whisper_loss=0.1035, over 3879975.71 frames. ], batch size: 55, lr: 3.15e-02, grad_scale: 2048.0 2024-08-09 18:18:43,953 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.342e-01 2024-08-09 18:18:49,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=133000.0, ans=0.125 2024-08-09 18:18:50,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.42 vs. limit=12.0 2024-08-09 18:19:01,525 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.489e+00 2024-08-09 18:19:22,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=133300.0, ans=0.125 2024-08-09 18:19:22,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=133300.0, ans=0.125 2024-08-09 18:19:28,373 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.183e-01 2024-08-09 18:19:28,659 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.82 vs. limit=22.5 2024-08-09 18:19:45,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=133400.0, ans=0.0 2024-08-09 18:19:49,156 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-09 18:19:50,148 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13350, loss[loss=0.1442, beats_loss=0.01045, ecapa_loss=0.000444, whisper_loss=0.1293, over 22598.00 frames. ], tot_loss[loss=0.1199, beats_loss=0.01305, ecapa_loss=0.000386, whisper_loss=0.103, over 3900275.16 frames. ], batch size: 89, lr: 3.15e-02, grad_scale: 2048.0 2024-08-09 18:19:55,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=133500.0, ans=0.1 2024-08-09 18:20:03,521 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-09 18:20:08,947 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.13 vs. limit=10.0 2024-08-09 18:20:26,714 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 38 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-09 18:20:31,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=133700.0, ans=0.0 2024-08-09 18:20:39,206 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.77 vs. limit=15.0 2024-08-09 18:21:03,440 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 3.029e+01 3.343e+01 3.897e+01 6.977e+01, threshold=6.687e+01, percent-clipped=1.0 2024-08-09 18:21:03,466 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13400, loss[loss=0.1244, beats_loss=0.01414, ecapa_loss=0.0003474, whisper_loss=0.1068, over 22728.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01306, ecapa_loss=0.0003836, whisper_loss=0.1035, over 3898963.71 frames. ], batch size: 92, lr: 3.14e-02, grad_scale: 2048.0 2024-08-09 18:21:04,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.33 vs. limit=6.0 2024-08-09 18:21:06,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=134000.0, ans=0.125 2024-08-09 18:21:06,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=134000.0, ans=0.125 2024-08-09 18:21:18,379 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.42 vs. limit=22.5 2024-08-09 18:21:34,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=134200.0, ans=0.04949747468305833 2024-08-09 18:21:37,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=134200.0, ans=0.1 2024-08-09 18:21:38,883 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 11 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-09 18:21:40,590 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 18:21:43,019 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-09 18:21:56,039 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.73 vs. limit=15.0 2024-08-09 18:22:05,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=134400.0, ans=0.09899494936611666 2024-08-09 18:22:08,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=134400.0, ans=0.0 2024-08-09 18:22:13,341 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13450, loss[loss=0.119, beats_loss=0.01352, ecapa_loss=0.0003812, whisper_loss=0.1017, over 22521.00 frames. ], tot_loss[loss=0.1202, beats_loss=0.01306, ecapa_loss=0.000383, whisper_loss=0.1033, over 3897482.72 frames. ], batch size: 90, lr: 3.14e-02, grad_scale: 2048.0 2024-08-09 18:22:24,425 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 18:22:29,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=134600.0, ans=0.125 2024-08-09 18:22:32,240 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2024-08-09 18:22:37,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=134600.0, ans=0.0 2024-08-09 18:22:40,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=134700.0, ans=0.0 2024-08-09 18:22:40,883 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.27 vs. limit=15.0 2024-08-09 18:22:45,845 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 29 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-09 18:22:49,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=134700.0, ans=0.125 2024-08-09 18:22:54,007 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-09 18:23:05,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=134800.0, ans=0.125 2024-08-09 18:23:06,909 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 18:23:11,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=134900.0, ans=0.125 2024-08-09 18:23:23,126 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.856e+01 3.489e+01 4.024e+01 6.380e+01, threshold=6.978e+01, percent-clipped=0.0 2024-08-09 18:23:23,150 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13500, loss[loss=0.1271, beats_loss=0.00964, ecapa_loss=0.0005343, whisper_loss=0.1121, over 16207.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.01306, ecapa_loss=0.0003858, whisper_loss=0.1031, over 3899386.94 frames. ], batch size: 67, lr: 3.14e-02, grad_scale: 2048.0 2024-08-09 18:23:30,536 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 18:23:40,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=135100.0, ans=0.2 2024-08-09 18:23:45,016 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2024-08-09 18:23:56,265 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2024-08-09 18:24:11,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=135300.0, ans=0.125 2024-08-09 18:24:26,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=135400.0, ans=0.0 2024-08-09 18:24:31,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=135400.0, ans=0.125 2024-08-09 18:24:33,430 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 24 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-09 18:24:34,595 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13550, loss[loss=0.1396, beats_loss=0.01068, ecapa_loss=0.000446, whisper_loss=0.1245, over 15775.00 frames. ], tot_loss[loss=0.1203, beats_loss=0.01293, ecapa_loss=0.0003856, whisper_loss=0.1035, over 3890078.57 frames. ], batch size: 63, lr: 3.13e-02, grad_scale: 2048.0 2024-08-09 18:24:49,457 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 18:25:23,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=135800.0, ans=0.0 2024-08-09 18:25:23,256 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-08-09 18:25:35,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=135900.0, ans=0.2 2024-08-09 18:25:47,057 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+01 3.070e+01 3.576e+01 4.104e+01 5.875e+01, threshold=7.153e+01, percent-clipped=0.0 2024-08-09 18:25:47,082 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13600, loss[loss=0.08889, beats_loss=0.01677, ecapa_loss=0.0003742, whisper_loss=0.06838, over 14117.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01295, ecapa_loss=0.0003865, whisper_loss=0.104, over 3893507.23 frames. ], batch size: 58, lr: 3.13e-02, grad_scale: 2048.0 2024-08-09 18:25:50,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=136000.0, ans=0.125 2024-08-09 18:25:53,682 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.96 vs. limit=15.0 2024-08-09 18:25:57,839 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 33 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 18:26:04,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=136100.0, ans=0.0 2024-08-09 18:26:28,243 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.275e+02 2024-08-09 18:26:34,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=136300.0, ans=0.125 2024-08-09 18:26:36,971 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.53 vs. limit=6.0 2024-08-09 18:26:39,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=136300.0, ans=0.125 2024-08-09 18:26:41,966 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-09 18:26:44,411 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 12 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 18:26:47,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2024-08-09 18:26:53,333 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-09 18:26:53,807 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2024-08-09 18:26:58,634 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13650, loss[loss=0.1183, beats_loss=0.01306, ecapa_loss=0.0003507, whisper_loss=0.1017, over 21760.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01295, ecapa_loss=0.0003865, whisper_loss=0.104, over 3888975.67 frames. ], batch size: 87, lr: 3.12e-02, grad_scale: 2048.0 2024-08-09 18:27:02,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=136500.0, ans=0.125 2024-08-09 18:27:05,919 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 18:27:12,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=136600.0, ans=0.1 2024-08-09 18:27:13,211 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-09 18:27:18,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=136600.0, ans=0.2 2024-08-09 18:27:18,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=136600.0, ans=0.125 2024-08-09 18:27:38,243 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.55 vs. limit=15.0 2024-08-09 18:27:43,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=136800.0, ans=0.125 2024-08-09 18:27:54,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=136900.0, ans=0.125 2024-08-09 18:28:09,757 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.873e+01 3.233e+01 3.836e+01 5.786e+01, threshold=6.466e+01, percent-clipped=0.0 2024-08-09 18:28:09,778 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13700, loss[loss=0.1545, beats_loss=0.01199, ecapa_loss=0.0003892, whisper_loss=0.1387, over 22956.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01294, ecapa_loss=0.000384, whisper_loss=0.104, over 3891366.14 frames. ], batch size: 91, lr: 3.12e-02, grad_scale: 2048.0 2024-08-09 18:28:20,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=137000.0, ans=0.035 2024-08-09 18:28:24,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=137100.0, ans=0.5 2024-08-09 18:28:34,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=137100.0, ans=0.2 2024-08-09 18:28:34,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=137100.0, ans=0.0 2024-08-09 18:28:44,816 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2024-08-09 18:28:54,055 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-09 18:29:01,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=137300.0, ans=0.125 2024-08-09 18:29:05,109 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 18:29:08,465 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2024-08-09 18:29:11,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=15.0 2024-08-09 18:29:20,259 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13750, loss[loss=0.1301, beats_loss=0.01125, ecapa_loss=0.0004302, whisper_loss=0.1146, over 22015.00 frames. ], tot_loss[loss=0.1213, beats_loss=0.01285, ecapa_loss=0.0003801, whisper_loss=0.1046, over 3882993.12 frames. ], batch size: 90, lr: 3.11e-02, grad_scale: 2048.0 2024-08-09 18:29:27,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=137500.0, ans=0.1 2024-08-09 18:29:30,639 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.98 vs. limit=15.0 2024-08-09 18:29:39,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=137600.0, ans=0.2 2024-08-09 18:29:46,757 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-09 18:29:55,247 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 18:29:59,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=137700.0, ans=0.0 2024-08-09 18:30:01,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=137800.0, ans=0.125 2024-08-09 18:30:21,110 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 36 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-09 18:30:28,785 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.493e+01 2.986e+01 3.490e+01 4.118e+01 8.159e+01, threshold=6.980e+01, percent-clipped=6.0 2024-08-09 18:30:28,810 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13800, loss[loss=0.1188, beats_loss=0.01345, ecapa_loss=0.0003211, whisper_loss=0.1021, over 18780.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01294, ecapa_loss=0.0003764, whisper_loss=0.1041, over 3909234.01 frames. ], batch size: 72, lr: 3.11e-02, grad_scale: 2048.0 2024-08-09 18:30:39,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=138000.0, ans=0.0 2024-08-09 18:30:47,088 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 18:30:53,471 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-08-09 18:31:07,317 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 24 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-09 18:31:27,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=138400.0, ans=0.035 2024-08-09 18:31:28,756 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=12.0 2024-08-09 18:31:30,662 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-09 18:31:35,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=138400.0, ans=0.1 2024-08-09 18:31:37,274 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13850, loss[loss=0.1315, beats_loss=0.01209, ecapa_loss=0.0003842, whisper_loss=0.1156, over 15551.00 frames. ], tot_loss[loss=0.1207, beats_loss=0.01288, ecapa_loss=0.0003776, whisper_loss=0.1041, over 3908851.84 frames. ], batch size: 62, lr: 3.11e-02, grad_scale: 2048.0 2024-08-09 18:31:42,989 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-09 18:32:12,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=138700.0, ans=0.2 2024-08-09 18:32:23,685 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 18:32:36,378 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-09 18:32:49,939 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 2.816e+01 3.337e+01 3.813e+01 6.629e+01, threshold=6.673e+01, percent-clipped=0.0 2024-08-09 18:32:49,966 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13900, loss[loss=0.08803, beats_loss=0.01403, ecapa_loss=0.0003737, whisper_loss=0.07026, over 15681.00 frames. ], tot_loss[loss=0.1199, beats_loss=0.01293, ecapa_loss=0.0003757, whisper_loss=0.1032, over 3874931.99 frames. ], batch size: 65, lr: 3.10e-02, grad_scale: 2048.0 2024-08-09 18:32:50,191 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-09 18:32:52,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.95 vs. limit=6.0 2024-08-09 18:32:56,956 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-09 18:33:00,902 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-09 18:33:04,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=139100.0, ans=0.2 2024-08-09 18:33:11,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=139100.0, ans=0.1 2024-08-09 18:33:14,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=139100.0, ans=0.0 2024-08-09 18:33:22,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=139200.0, ans=0.0 2024-08-09 18:33:28,511 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.14 vs. limit=15.0 2024-08-09 18:33:33,892 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-09 18:33:59,640 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2024-08-09 18:34:00,037 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 13950, loss[loss=0.1171, beats_loss=0.01088, ecapa_loss=0.0004334, whisper_loss=0.1018, over 21188.00 frames. ], tot_loss[loss=0.1195, beats_loss=0.01294, ecapa_loss=0.000376, whisper_loss=0.1028, over 3884134.49 frames. ], batch size: 88, lr: 3.10e-02, grad_scale: 2048.0 2024-08-09 18:34:20,339 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-09 18:34:20,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=139600.0, ans=0.2 2024-08-09 18:34:20,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=139600.0, ans=0.05 2024-08-09 18:34:28,885 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-09 18:34:37,265 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.832e-01 2024-08-09 18:34:42,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=139800.0, ans=0.125 2024-08-09 18:34:45,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=139800.0, ans=0.1 2024-08-09 18:34:51,189 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-09 18:34:58,331 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-09 18:35:09,076 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 3.055e+01 3.459e+01 4.049e+01 5.260e+01, threshold=6.917e+01, percent-clipped=0.0 2024-08-09 18:35:09,099 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 14000, loss[loss=0.1439, beats_loss=0.01162, ecapa_loss=0.0004384, whisper_loss=0.1279, over 14937.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01295, ecapa_loss=0.0003738, whisper_loss=0.1037, over 3916871.10 frames. ], batch size: 62, lr: 3.09e-02, grad_scale: 4096.0 2024-08-09 18:35:24,451 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.67 vs. limit=22.5 2024-08-09 18:35:43,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=140200.0, ans=0.2 2024-08-09 18:35:45,021 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.40 vs. limit=15.0 2024-08-09 18:36:07,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140400.0, ans=0.1 2024-08-09 18:36:15,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=140400.0, ans=0.0 2024-08-09 18:36:18,043 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 14050, loss[loss=0.1378, beats_loss=0.01208, ecapa_loss=0.0003275, whisper_loss=0.1224, over 24117.00 frames. ], tot_loss[loss=0.1203, beats_loss=0.01295, ecapa_loss=0.0003723, whisper_loss=0.1036, over 3913951.67 frames. ], batch size: 90, lr: 3.09e-02, grad_scale: 4096.0 2024-08-09 18:36:42,424 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 18:36:42,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=140600.0, ans=22.5 2024-08-09 18:36:46,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=140700.0, ans=0.0 2024-08-09 18:36:50,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=140700.0, ans=0.0 2024-08-09 18:36:53,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=140700.0, ans=0.2 2024-08-09 18:36:57,843 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 19 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-09 18:36:59,041 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-09 18:37:07,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=140800.0, ans=0.0 2024-08-09 18:37:08,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=140800.0, ans=0.125 2024-08-09 18:37:18,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=140900.0, ans=0.05 2024-08-09 18:37:24,571 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-09 18:37:27,625 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 3.062e+01 3.430e+01 4.130e+01 6.899e+01, threshold=6.859e+01, percent-clipped=0.0 2024-08-09 18:37:27,646 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 14100, loss[loss=0.1379, beats_loss=0.01203, ecapa_loss=0.000418, whisper_loss=0.1217, over 23635.00 frames. ], tot_loss[loss=0.1203, beats_loss=0.01296, ecapa_loss=0.0003724, whisper_loss=0.1037, over 3935442.09 frames. ], batch size: 94, lr: 3.08e-02, grad_scale: 4096.0 2024-08-09 18:37:28,234 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 18:37:31,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=141000.0, ans=0.125 2024-08-09 18:37:51,117 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-09 18:37:58,228 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-09 18:37:59,873 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 18:38:17,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=141300.0, ans=0.125 2024-08-09 18:38:18,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=141300.0, ans=0.0 2024-08-09 18:38:18,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=141300.0, ans=0.125 2024-08-09 18:38:18,769 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.44 vs. limit=22.5 2024-08-09 18:38:22,803 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2024-08-09 18:38:25,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=141400.0, ans=0.125 2024-08-09 18:38:36,757 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.53 vs. limit=6.0 2024-08-09 18:38:37,400 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 14150, loss[loss=0.1357, beats_loss=0.009056, ecapa_loss=0.0004734, whisper_loss=0.1219, over 21651.00 frames. ], tot_loss[loss=0.1198, beats_loss=0.01301, ecapa_loss=0.0003733, whisper_loss=0.1031, over 3907504.64 frames. ], batch size: 87, lr: 3.08e-02, grad_scale: 4096.0 2024-08-09 18:38:43,588 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.06 vs. limit=6.0 2024-08-09 18:38:45,240 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=15.0 2024-08-09 18:38:49,640 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.95 vs. limit=6.0 2024-08-09 18:39:00,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141600.0, ans=0.1 2024-08-09 18:39:20,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=141800.0, ans=0.0 2024-08-09 18:39:31,498 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 28 from LS+wenet, 8 from Vox, 21 fro AS 2024-08-09 18:39:33,163 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 21 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-09 18:39:42,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=141900.0, ans=0.05 2024-08-09 18:39:48,224 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.51 vs. limit=15.0 2024-08-09 18:39:48,617 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 3.107e+01 3.530e+01 4.182e+01 6.705e+01, threshold=7.061e+01, percent-clipped=0.0 2024-08-09 18:39:48,638 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 14200, loss[loss=0.1067, beats_loss=0.01357, ecapa_loss=0.0003529, whisper_loss=0.08963, over 20913.00 frames. ], tot_loss[loss=0.1197, beats_loss=0.01294, ecapa_loss=0.0003739, whisper_loss=0.103, over 3893383.10 frames. ], batch size: 82, lr: 3.08e-02, grad_scale: 4096.0 2024-08-09 18:39:56,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=142000.0, ans=0.1 2024-08-09 18:40:02,352 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-09 18:40:12,516 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-09 18:40:17,102 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-09 18:40:32,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=142300.0, ans=0.1 2024-08-09 18:40:44,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=142300.0, ans=0.0 2024-08-09 18:40:53,670 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-09 18:40:55,643 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.21 vs. limit=15.0 2024-08-09 18:40:58,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=142400.0, ans=0.0 2024-08-09 18:41:04,233 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 14250, loss[loss=0.1088, beats_loss=0.01192, ecapa_loss=0.0003348, whisper_loss=0.09357, over 18456.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.01292, ecapa_loss=0.0003722, whisper_loss=0.1035, over 3909178.82 frames. ], batch size: 72, lr: 3.07e-02, grad_scale: 4096.0 2024-08-09 18:41:07,595 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-09 18:41:15,250 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 18:41:25,826 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-09 18:41:33,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=142700.0, ans=0.0 2024-08-09 18:41:40,404 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.39 vs. limit=12.0 2024-08-09 18:41:47,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=142700.0, ans=0.0 2024-08-09 18:41:53,213 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 18:41:55,495 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.17 vs. limit=6.0 2024-08-09 18:41:56,136 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 18:41:59,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=142800.0, ans=0.1 2024-08-09 18:42:09,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=142900.0, ans=0.1 2024-08-09 18:42:19,819 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+01 2.991e+01 3.300e+01 4.002e+01 6.725e+01, threshold=6.600e+01, percent-clipped=0.0 2024-08-09 18:42:19,840 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 14300, loss[loss=0.08458, beats_loss=0.0172, ecapa_loss=0.000232, whisper_loss=0.06506, over 18293.00 frames. ], tot_loss[loss=0.1196, beats_loss=0.01298, ecapa_loss=0.0003685, whisper_loss=0.103, over 3929792.21 frames. ], batch size: 75, lr: 3.07e-02, grad_scale: 4096.0 2024-08-09 18:42:20,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=143000.0, ans=0.125 2024-08-09 18:42:23,415 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 22 from LS+wenet, 18 from Vox, 13 fro AS 2024-08-09 18:42:38,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=143100.0, ans=0.125 2024-08-09 18:42:47,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=143100.0, ans=0.0 2024-08-09 18:43:11,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=143300.0, ans=0.1 2024-08-09 18:43:33,123 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 14350, loss[loss=0.1197, beats_loss=0.01378, ecapa_loss=0.0004069, whisper_loss=0.1018, over 22138.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01302, ecapa_loss=0.0003688, whisper_loss=0.1021, over 3947714.38 frames. ], batch size: 93, lr: 3.06e-02, grad_scale: 4096.0 2024-08-09 18:43:43,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=143500.0, ans=0.0 2024-08-09 18:43:52,823 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.681e+02 2024-08-09 18:44:08,641 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=15.0 2024-08-09 18:44:09,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=143700.0, ans=0.125 2024-08-09 18:44:12,240 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-09 18:44:14,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=143700.0, ans=0.0 2024-08-09 18:44:22,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=143800.0, ans=0.125 2024-08-09 18:44:30,167 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 28 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-09 18:44:35,963 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.07 vs. limit=15.0 2024-08-09 18:44:47,769 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-09 18:44:48,775 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.978e+01 3.379e+01 3.872e+01 1.013e+02, threshold=6.758e+01, percent-clipped=3.0 2024-08-09 18:44:48,799 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 14400, loss[loss=0.1298, beats_loss=0.01404, ecapa_loss=0.0003776, whisper_loss=0.112, over 20037.00 frames. ], tot_loss[loss=0.1193, beats_loss=0.01297, ecapa_loss=0.000369, whisper_loss=0.1027, over 3930965.82 frames. ], batch size: 80, lr: 3.06e-02, grad_scale: 4096.0 2024-08-09 18:44:52,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=144000.0, ans=0.125 2024-08-09 18:45:03,856 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 13 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-09 18:45:07,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=144100.0, ans=0.2 2024-08-09 18:45:21,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=144200.0, ans=0.125 2024-08-09 18:45:51,525 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 18:45:59,307 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-09 18:46:00,605 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-09 18:46:01,638 INFO [train_multi_KD3.py:1116] (2/4) Epoch 1, batch 14450, loss[loss=0.1251, beats_loss=0.0132, ecapa_loss=0.0003067, whisper_loss=0.1088, over 22023.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01309, ecapa_loss=0.0003701, whisper_loss=0.1018, over 3951500.45 frames. ], batch size: 85, lr: 3.05e-02, grad_scale: 4096.0 2024-08-09 18:46:05,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=144500.0, ans=0.2 2024-08-09 18:46:06,307 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-09 18:46:10,779 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.667e+02 2024-08-09 18:46:27,774 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 18:46:34,755 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-09 18:46:35,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=144700.0, ans=0.125 2024-08-09 18:46:42,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=144700.0, ans=0.025 2024-08-09 18:46:44,788 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-09 18:46:46,321 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-09 18:46:51,726 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-09 18:46:54,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=144800.0, ans=0.1 2024-08-09 18:47:50,813 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-09 18:47:51,897 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 0, loss[loss=0.1096, beats_loss=0.01488, ecapa_loss=0.0003705, whisper_loss=0.09101, over 20853.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01488, ecapa_loss=0.0003705, whisper_loss=0.09101, over 20853.00 frames. ], batch size: 84, lr: 2.99e-02, grad_scale: 4096.0 2024-08-09 18:47:51,897 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-09 18:48:33,861 INFO [train_multi_KD3.py:1149] (2/4) Epoch 2, validation on ASR_libri: loss=0.287, beats_loss=0, ecapa_loss=0.001066, whisper_loss=0.2763, over 922467.00 frames. 2024-08-09 18:48:50,310 INFO [train_multi_KD3.py:1149] (2/4) Epoch 2, validation on SV_voxceleb1: loss=0.009611, beats_loss=0, ecapa_loss=0.0009611, whisper_loss=0, over 939242.00 frames. 2024-08-09 18:50:53,508 INFO [train_multi_KD3.py:1149] (2/4) Epoch 2, validation on AT_audioset: loss=0.0306, beats_loss=0.0306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 18:50:53,512 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-09 18:50:53,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=144980.0, ans=0.0 2024-08-09 18:50:56,042 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.997e+01 3.426e+01 4.261e+01 6.161e+01, threshold=6.853e+01, percent-clipped=0.0 2024-08-09 18:51:04,044 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 18:51:37,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=145080.0, ans=0.2 2024-08-09 18:52:19,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=145280.0, ans=0.125 2024-08-09 18:52:38,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=145380.0, ans=0.125 2024-08-09 18:53:03,247 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 50, loss[loss=0.1235, beats_loss=0.01338, ecapa_loss=0.0003854, whisper_loss=0.1063, over 20079.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01351, ecapa_loss=0.0003817, whisper_loss=0.1014, over 883773.97 frames. ], batch size: 78, lr: 2.99e-02, grad_scale: 4096.0 2024-08-09 18:53:11,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=145480.0, ans=0.125 2024-08-09 18:53:14,486 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.72 vs. limit=15.0 2024-08-09 18:53:41,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=145580.0, ans=0.0 2024-08-09 18:53:44,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=145580.0, ans=0.0 2024-08-09 18:54:42,613 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.80 vs. limit=15.0 2024-08-09 18:54:43,937 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 18:54:52,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=145880.0, ans=0.125 2024-08-09 18:54:55,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=145880.0, ans=0.0 2024-08-09 18:55:03,636 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 100, loss[loss=0.09115, beats_loss=0.01652, ecapa_loss=0.0003868, whisper_loss=0.07076, over 14511.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01341, ecapa_loss=0.0003686, whisper_loss=0.1001, over 1560525.06 frames. ], batch size: 60, lr: 2.98e-02, grad_scale: 4096.0 2024-08-09 18:55:07,827 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.308e+01 3.227e+01 3.507e+01 4.114e+01 7.130e+01, threshold=7.014e+01, percent-clipped=1.0 2024-08-09 18:55:28,834 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-09 18:56:05,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=146180.0, ans=0.125 2024-08-09 18:56:22,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=146280.0, ans=0.0 2024-08-09 18:56:33,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=146380.0, ans=0.1 2024-08-09 18:56:40,566 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-09 18:56:43,694 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-09 18:56:43,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=146380.0, ans=0.125 2024-08-09 18:56:53,492 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 150, loss[loss=0.09216, beats_loss=0.01416, ecapa_loss=0.000391, whisper_loss=0.07408, over 16314.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01321, ecapa_loss=0.0003649, whisper_loss=0.1019, over 2054450.39 frames. ], batch size: 65, lr: 2.98e-02, grad_scale: 4096.0 2024-08-09 18:56:57,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=146480.0, ans=0.125 2024-08-09 18:57:02,984 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 18:57:28,708 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.33 vs. limit=15.0 2024-08-09 18:57:30,352 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.23 vs. limit=22.5 2024-08-09 18:57:37,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=146680.0, ans=10.0 2024-08-09 18:57:42,482 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 18:57:58,347 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-09 18:58:07,075 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.67 vs. limit=15.0 2024-08-09 18:58:08,865 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.63 vs. limit=15.0 2024-08-09 18:58:09,908 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 18:58:20,521 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 200, loss[loss=0.1299, beats_loss=0.01008, ecapa_loss=0.0004152, whisper_loss=0.1156, over 23044.00 frames. ], tot_loss[loss=0.1196, beats_loss=0.01301, ecapa_loss=0.0003627, whisper_loss=0.103, over 2441328.62 frames. ], batch size: 91, lr: 2.97e-02, grad_scale: 4096.0 2024-08-09 18:58:23,273 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.970e+01 3.444e+01 4.293e+01 6.916e+01, threshold=6.888e+01, percent-clipped=0.0 2024-08-09 18:58:31,252 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.51 vs. limit=6.0 2024-08-09 18:58:35,517 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-09 18:58:50,682 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 18:59:11,257 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.93 vs. limit=15.0 2024-08-09 18:59:39,056 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 250, loss[loss=0.09222, beats_loss=0.01575, ecapa_loss=0.0002892, whisper_loss=0.07358, over 17003.00 frames. ], tot_loss[loss=0.1196, beats_loss=0.01286, ecapa_loss=0.0003601, whisper_loss=0.1032, over 2751661.07 frames. ], batch size: 67, lr: 2.97e-02, grad_scale: 4096.0 2024-08-09 18:59:40,563 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-09 18:59:48,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=147480.0, ans=0.0 2024-08-09 19:00:00,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=147580.0, ans=0.2 2024-08-09 19:00:09,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=147680.0, ans=0.0 2024-08-09 19:00:29,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=147780.0, ans=10.0 2024-08-09 19:00:54,255 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 300, loss[loss=0.1159, beats_loss=0.01187, ecapa_loss=0.0004172, whisper_loss=0.09991, over 15626.00 frames. ], tot_loss[loss=0.1196, beats_loss=0.01283, ecapa_loss=0.0003542, whisper_loss=0.1032, over 2998759.12 frames. ], batch size: 66, lr: 2.97e-02, grad_scale: 4096.0 2024-08-09 19:00:57,400 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 3.134e+01 3.449e+01 4.098e+01 7.776e+01, threshold=6.897e+01, percent-clipped=1.0 2024-08-09 19:01:00,571 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 13 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-09 19:01:18,598 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 34 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-09 19:01:18,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=148080.0, ans=0.0 2024-08-09 19:01:38,288 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.41 vs. limit=15.0 2024-08-09 19:01:46,985 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-09 19:01:52,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=148380.0, ans=0.1 2024-08-09 19:01:54,388 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-09 19:02:05,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=148380.0, ans=0.2 2024-08-09 19:02:08,100 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 350, loss[loss=0.1142, beats_loss=0.01419, ecapa_loss=0.0003177, whisper_loss=0.09685, over 18955.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01291, ecapa_loss=0.0003534, whisper_loss=0.1023, over 3205355.06 frames. ], batch size: 75, lr: 2.96e-02, grad_scale: 4096.0 2024-08-09 19:02:11,252 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 17 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-09 19:02:14,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=148480.0, ans=0.125 2024-08-09 19:02:15,905 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.25 vs. limit=15.0 2024-08-09 19:02:24,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=148580.0, ans=0.125 2024-08-09 19:02:26,679 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.11 vs. limit=15.0 2024-08-09 19:02:39,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=148680.0, ans=0.1 2024-08-09 19:03:09,614 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-09 19:03:15,591 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-09 19:03:23,104 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 400, loss[loss=0.097, beats_loss=0.01357, ecapa_loss=0.0003452, whisper_loss=0.07998, over 22014.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01276, ecapa_loss=0.000351, whisper_loss=0.1019, over 3308781.42 frames. ], batch size: 89, lr: 2.96e-02, grad_scale: 4096.0 2024-08-09 19:03:25,563 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 2.813e+01 3.235e+01 3.879e+01 6.977e+01, threshold=6.469e+01, percent-clipped=1.0 2024-08-09 19:03:32,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=148980.0, ans=0.2 2024-08-09 19:03:38,326 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.83 vs. limit=22.5 2024-08-09 19:03:43,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=149080.0, ans=0.125 2024-08-09 19:03:53,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=149180.0, ans=0.125 2024-08-09 19:04:19,912 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 30 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-09 19:04:25,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=149380.0, ans=0.125 2024-08-09 19:04:27,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=149380.0, ans=0.1 2024-08-09 19:04:38,773 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 450, loss[loss=0.1178, beats_loss=0.01038, ecapa_loss=0.0003205, whisper_loss=0.1043, over 16062.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01272, ecapa_loss=0.0003487, whisper_loss=0.1017, over 3426833.26 frames. ], batch size: 60, lr: 2.95e-02, grad_scale: 4096.0 2024-08-09 19:04:42,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=149480.0, ans=0.1 2024-08-09 19:04:48,209 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 19:04:57,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=149580.0, ans=0.0 2024-08-09 19:05:04,635 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-09 19:05:09,852 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-09 19:05:11,831 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.80 vs. limit=22.5 2024-08-09 19:05:27,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=149780.0, ans=0.2 2024-08-09 19:05:54,131 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 500, loss[loss=0.1071, beats_loss=0.01308, ecapa_loss=0.0003287, whisper_loss=0.09075, over 21910.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01266, ecapa_loss=0.0003451, whisper_loss=0.1026, over 3561850.73 frames. ], batch size: 88, lr: 2.95e-02, grad_scale: 4096.0 2024-08-09 19:05:57,095 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.962e+01 3.493e+01 4.226e+01 6.986e+01, threshold=6.987e+01, percent-clipped=1.0 2024-08-09 19:06:16,731 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-09 19:06:26,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=150180.0, ans=0.125 2024-08-09 19:06:38,652 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-09 19:06:41,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=150280.0, ans=0.125 2024-08-09 19:06:50,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=150280.0, ans=0.0 2024-08-09 19:06:53,020 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 19:06:53,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=150280.0, ans=0.0 2024-08-09 19:07:01,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=150380.0, ans=0.125 2024-08-09 19:07:07,965 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-09 19:07:10,307 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 550, loss[loss=0.08249, beats_loss=0.01201, ecapa_loss=0.0002775, whisper_loss=0.06771, over 19081.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01265, ecapa_loss=0.0003453, whisper_loss=0.1021, over 3614636.07 frames. ], batch size: 72, lr: 2.95e-02, grad_scale: 4096.0 2024-08-09 19:07:12,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=150480.0, ans=0.1 2024-08-09 19:07:12,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=150480.0, ans=0.0 2024-08-09 19:07:20,638 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 19:07:32,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=150580.0, ans=0.2 2024-08-09 19:07:50,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=150680.0, ans=0.0 2024-08-09 19:07:58,088 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 19:08:00,600 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.83 vs. limit=6.0 2024-08-09 19:08:01,429 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-09 19:08:05,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=150780.0, ans=0.0 2024-08-09 19:08:10,634 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 33 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 19:08:10,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=150880.0, ans=0.125 2024-08-09 19:08:13,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=150880.0, ans=0.0 2024-08-09 19:08:26,069 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 600, loss[loss=0.1297, beats_loss=0.01282, ecapa_loss=0.0003464, whisper_loss=0.1134, over 16385.00 frames. ], tot_loss[loss=0.1191, beats_loss=0.01261, ecapa_loss=0.0003427, whisper_loss=0.103, over 3633789.41 frames. ], batch size: 62, lr: 2.94e-02, grad_scale: 4096.0 2024-08-09 19:08:26,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=150980.0, ans=0.0 2024-08-09 19:08:28,795 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.923e+01 3.308e+01 3.857e+01 5.897e+01, threshold=6.616e+01, percent-clipped=0.0 2024-08-09 19:08:34,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=150980.0, ans=0.125 2024-08-09 19:08:44,189 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-09 19:08:46,043 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.33 vs. limit=22.5 2024-08-09 19:08:57,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=151180.0, ans=0.125 2024-08-09 19:09:00,287 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 19:09:00,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=151180.0, ans=0.0 2024-08-09 19:09:00,995 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.33 vs. limit=15.0 2024-08-09 19:09:08,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=151180.0, ans=15.0 2024-08-09 19:09:09,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=151280.0, ans=0.125 2024-08-09 19:09:36,212 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-09 19:09:40,400 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 650, loss[loss=0.1296, beats_loss=0.01372, ecapa_loss=0.000289, whisper_loss=0.113, over 16679.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.0127, ecapa_loss=0.0003448, whisper_loss=0.1027, over 3672035.48 frames. ], batch size: 62, lr: 2.94e-02, grad_scale: 4096.0 2024-08-09 19:09:46,797 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.88 vs. limit=15.0 2024-08-09 19:09:52,450 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2024-08-09 19:10:11,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.86 vs. limit=15.0 2024-08-09 19:10:14,928 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-09 19:10:31,150 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-09 19:10:31,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=151780.0, ans=0.04949747468305833 2024-08-09 19:10:31,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=151780.0, ans=0.0 2024-08-09 19:10:34,813 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.90 vs. limit=6.0 2024-08-09 19:10:48,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=151880.0, ans=0.2 2024-08-09 19:10:54,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=151980.0, ans=0.0 2024-08-09 19:10:55,128 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 700, loss[loss=0.1411, beats_loss=0.01229, ecapa_loss=0.0003525, whisper_loss=0.1252, over 23934.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01269, ecapa_loss=0.0003438, whisper_loss=0.102, over 3681917.79 frames. ], batch size: 94, lr: 2.94e-02, grad_scale: 4096.0 2024-08-09 19:10:57,520 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=12.0 2024-08-09 19:10:57,915 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.103e+01 2.682e+01 3.217e+01 3.765e+01 7.105e+01, threshold=6.434e+01, percent-clipped=1.0 2024-08-09 19:10:59,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=151980.0, ans=0.125 2024-08-09 19:11:02,466 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 19:11:04,543 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-09 19:11:24,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=152180.0, ans=0.0 2024-08-09 19:12:10,101 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 750, loss[loss=0.1392, beats_loss=0.01447, ecapa_loss=0.0003355, whisper_loss=0.1214, over 22847.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.0127, ecapa_loss=0.0003416, whisper_loss=0.1021, over 3759352.70 frames. ], batch size: 88, lr: 2.93e-02, grad_scale: 4096.0 2024-08-09 19:12:18,004 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 19:12:33,981 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-09 19:12:57,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=152780.0, ans=0.1 2024-08-09 19:13:00,160 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.67 vs. limit=15.0 2024-08-09 19:13:04,992 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-09 19:13:26,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=152980.0, ans=0.95 2024-08-09 19:13:26,871 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 800, loss[loss=0.1357, beats_loss=0.012, ecapa_loss=0.0002875, whisper_loss=0.1208, over 21045.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01268, ecapa_loss=0.0003404, whisper_loss=0.1018, over 3751735.34 frames. ], batch size: 77, lr: 2.93e-02, grad_scale: 4096.0 2024-08-09 19:13:30,113 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.196e+01 2.796e+01 3.224e+01 3.871e+01 5.736e+01, threshold=6.448e+01, percent-clipped=0.0 2024-08-09 19:13:34,487 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 19:13:40,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=153080.0, ans=0.1 2024-08-09 19:13:50,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=153080.0, ans=0.125 2024-08-09 19:13:58,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=153180.0, ans=22.5 2024-08-09 19:14:14,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=153280.0, ans=0.0 2024-08-09 19:14:17,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=153280.0, ans=0.125 2024-08-09 19:14:18,980 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 19:14:32,976 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-09 19:14:36,398 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.60 vs. limit=6.0 2024-08-09 19:14:43,355 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 850, loss[loss=0.1263, beats_loss=0.01015, ecapa_loss=0.0003805, whisper_loss=0.1124, over 23678.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01266, ecapa_loss=0.0003415, whisper_loss=0.1012, over 3744249.79 frames. ], batch size: 92, lr: 2.92e-02, grad_scale: 4096.0 2024-08-09 19:14:43,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=153480.0, ans=0.125 2024-08-09 19:14:53,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=153480.0, ans=0.125 2024-08-09 19:15:04,562 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.25 vs. limit=15.0 2024-08-09 19:15:07,074 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-09 19:15:17,227 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.09 vs. limit=6.0 2024-08-09 19:15:20,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=153680.0, ans=0.0 2024-08-09 19:15:32,193 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-09 19:15:49,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=153880.0, ans=0.125 2024-08-09 19:16:02,494 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 900, loss[loss=0.1215, beats_loss=0.01465, ecapa_loss=0.0003137, whisper_loss=0.1037, over 15435.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01261, ecapa_loss=0.000339, whisper_loss=0.1018, over 3760799.50 frames. ], batch size: 58, lr: 2.92e-02, grad_scale: 4096.0 2024-08-09 19:16:05,856 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.273e+01 2.893e+01 3.249e+01 3.934e+01 7.637e+01, threshold=6.497e+01, percent-clipped=1.0 2024-08-09 19:16:12,073 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-09 19:16:15,066 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 19:16:27,568 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-09 19:16:38,385 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 18 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 19:16:46,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=154180.0, ans=0.125 2024-08-09 19:16:54,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=154280.0, ans=0.2 2024-08-09 19:16:54,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=154280.0, ans=0.125 2024-08-09 19:17:03,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=154380.0, ans=0.125 2024-08-09 19:17:16,843 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-09 19:17:18,310 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-09 19:17:19,531 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 950, loss[loss=0.1248, beats_loss=0.01242, ecapa_loss=0.0003833, whisper_loss=0.1085, over 20875.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01276, ecapa_loss=0.0003365, whisper_loss=0.1011, over 3739149.54 frames. ], batch size: 87, lr: 2.92e-02, grad_scale: 4096.0 2024-08-09 19:17:47,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=154580.0, ans=0.1 2024-08-09 19:17:50,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=154680.0, ans=0.125 2024-08-09 19:18:09,182 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 20 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-09 19:18:09,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=154780.0, ans=0.09899494936611666 2024-08-09 19:18:12,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=154780.0, ans=0.0 2024-08-09 19:18:19,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=154780.0, ans=0.125 2024-08-09 19:18:23,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=154880.0, ans=0.125 2024-08-09 19:18:26,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=154880.0, ans=0.0 2024-08-09 19:18:37,864 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1000, loss[loss=0.1152, beats_loss=0.01506, ecapa_loss=0.0002259, whisper_loss=0.09792, over 15477.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01283, ecapa_loss=0.0003365, whisper_loss=0.1017, over 3768979.00 frames. ], batch size: 56, lr: 2.91e-02, grad_scale: 4096.0 2024-08-09 19:18:41,085 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.201e+01 2.941e+01 3.307e+01 3.877e+01 7.420e+01, threshold=6.613e+01, percent-clipped=2.0 2024-08-09 19:18:47,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=154980.0, ans=0.0 2024-08-09 19:19:02,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=155080.0, ans=0.125 2024-08-09 19:19:11,138 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=12.0 2024-08-09 19:19:33,783 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 19:19:37,308 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-09 19:19:40,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=155280.0, ans=0.125 2024-08-09 19:19:43,454 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 32 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-09 19:19:56,805 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2024-08-09 19:19:59,558 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1050, loss[loss=0.1231, beats_loss=0.01106, ecapa_loss=0.0004174, whisper_loss=0.1079, over 19892.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01279, ecapa_loss=0.0003355, whisper_loss=0.1022, over 3781968.73 frames. ], batch size: 81, lr: 2.91e-02, grad_scale: 4096.0 2024-08-09 19:20:01,916 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-09 19:20:02,726 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 19:20:07,511 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-09 19:20:16,411 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 28 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-09 19:20:31,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=155680.0, ans=0.125 2024-08-09 19:20:36,683 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.441e+02 2024-08-09 19:20:41,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=155680.0, ans=0.0 2024-08-09 19:20:44,791 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-09 19:20:51,752 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-09 19:20:58,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=155880.0, ans=0.125 2024-08-09 19:21:01,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=155880.0, ans=0.125 2024-08-09 19:21:13,802 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1100, loss[loss=0.1212, beats_loss=0.01299, ecapa_loss=0.0002925, whisper_loss=0.1053, over 20355.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01285, ecapa_loss=0.0003344, whisper_loss=0.1016, over 3784790.17 frames. ], batch size: 78, lr: 2.90e-02, grad_scale: 4096.0 2024-08-09 19:21:16,728 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.59 vs. limit=22.5 2024-08-09 19:21:17,118 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.935e+01 3.266e+01 4.117e+01 7.646e+01, threshold=6.532e+01, percent-clipped=3.0 2024-08-09 19:21:37,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=156080.0, ans=0.125 2024-08-09 19:21:39,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=156080.0, ans=0.125 2024-08-09 19:21:40,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=156080.0, ans=0.125 2024-08-09 19:21:55,632 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 27 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-09 19:22:00,049 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 19:22:03,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=156280.0, ans=0.2 2024-08-09 19:22:08,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=156280.0, ans=0.07 2024-08-09 19:22:23,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=156480.0, ans=0.0 2024-08-09 19:22:24,106 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1150, loss[loss=0.1198, beats_loss=0.01483, ecapa_loss=0.00032, whisper_loss=0.1017, over 22713.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01289, ecapa_loss=0.0003311, whisper_loss=0.1017, over 3811150.71 frames. ], batch size: 90, lr: 2.90e-02, grad_scale: 4096.0 2024-08-09 19:22:29,278 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 19:22:34,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=156480.0, ans=0.1 2024-08-09 19:22:49,737 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-09 19:22:50,968 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 19:23:00,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=156680.0, ans=0.125 2024-08-09 19:23:08,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=156780.0, ans=0.125 2024-08-09 19:23:14,534 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 19:23:30,625 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1200, loss[loss=0.1145, beats_loss=0.01617, ecapa_loss=0.0003364, whisper_loss=0.09499, over 16157.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01293, ecapa_loss=0.0003303, whisper_loss=0.1017, over 3831796.08 frames. ], batch size: 64, lr: 2.90e-02, grad_scale: 4096.0 2024-08-09 19:23:33,116 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.304e+01 2.894e+01 3.270e+01 3.890e+01 7.018e+01, threshold=6.539e+01, percent-clipped=1.0 2024-08-09 19:23:40,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=156980.0, ans=0.05 2024-08-09 19:23:44,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=157080.0, ans=0.1 2024-08-09 19:23:45,501 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 9 from Vox, 33 fro AS 2024-08-09 19:23:49,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=157080.0, ans=22.5 2024-08-09 19:24:08,414 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-09 19:24:18,325 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-09 19:24:20,228 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2024-08-09 19:24:20,873 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-09 19:24:24,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=157380.0, ans=0.0 2024-08-09 19:24:32,901 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 19:24:34,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=157380.0, ans=0.0 2024-08-09 19:24:34,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=157380.0, ans=0.0 2024-08-09 19:24:36,069 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1250, loss[loss=0.1063, beats_loss=0.01248, ecapa_loss=0.0003077, whisper_loss=0.09078, over 23209.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01287, ecapa_loss=0.0003281, whisper_loss=0.1025, over 3843062.11 frames. ], batch size: 91, lr: 2.89e-02, grad_scale: 4096.0 2024-08-09 19:24:39,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=157480.0, ans=0.125 2024-08-09 19:24:44,098 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-09 19:25:00,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=157580.0, ans=0.125 2024-08-09 19:25:06,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=157680.0, ans=0.1 2024-08-09 19:25:26,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=157780.0, ans=0.0 2024-08-09 19:25:35,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=157880.0, ans=0.125 2024-08-09 19:25:40,595 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-09 19:25:41,633 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1300, loss[loss=0.1039, beats_loss=0.01524, ecapa_loss=0.000391, whisper_loss=0.08477, over 14525.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01281, ecapa_loss=0.0003292, whisper_loss=0.1023, over 3817307.06 frames. ], batch size: 64, lr: 2.89e-02, grad_scale: 4096.0 2024-08-09 19:25:44,221 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.862e+01 3.141e+01 3.804e+01 7.057e+01, threshold=6.283e+01, percent-clipped=1.0 2024-08-09 19:25:54,834 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-09 19:26:20,181 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-09 19:26:21,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=158280.0, ans=0.0 2024-08-09 19:26:29,405 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-09 19:26:46,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=158480.0, ans=0.125 2024-08-09 19:26:47,421 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1350, loss[loss=0.1124, beats_loss=0.01028, ecapa_loss=0.0003156, whisper_loss=0.09894, over 18506.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01278, ecapa_loss=0.0003296, whisper_loss=0.1026, over 3822180.00 frames. ], batch size: 72, lr: 2.89e-02, grad_scale: 4096.0 2024-08-09 19:27:11,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=158580.0, ans=0.0 2024-08-09 19:27:13,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=158680.0, ans=0.1 2024-08-09 19:27:17,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=158680.0, ans=0.1 2024-08-09 19:27:37,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=158780.0, ans=0.125 2024-08-09 19:27:42,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=158880.0, ans=0.1 2024-08-09 19:27:43,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=158880.0, ans=0.0 2024-08-09 19:27:44,849 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 19:27:53,846 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1400, loss[loss=0.1334, beats_loss=0.01317, ecapa_loss=0.000353, whisper_loss=0.1167, over 23100.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01269, ecapa_loss=0.0003299, whisper_loss=0.1027, over 3843338.54 frames. ], batch size: 92, lr: 2.88e-02, grad_scale: 4096.0 2024-08-09 19:27:56,774 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 2.826e+01 3.197e+01 3.856e+01 5.556e+01, threshold=6.395e+01, percent-clipped=0.0 2024-08-09 19:27:59,539 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 39 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-09 19:28:00,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=158980.0, ans=0.2 2024-08-09 19:28:10,890 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 28 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-09 19:28:26,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=159180.0, ans=0.125 2024-08-09 19:28:28,363 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.28 vs. limit=10.0 2024-08-09 19:28:53,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=159380.0, ans=0.125 2024-08-09 19:28:56,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=159380.0, ans=0.125 2024-08-09 19:29:00,235 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1450, loss[loss=0.126, beats_loss=0.01084, ecapa_loss=0.0003181, whisper_loss=0.112, over 17919.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01263, ecapa_loss=0.0003298, whisper_loss=0.1031, over 3835248.14 frames. ], batch size: 68, lr: 2.88e-02, grad_scale: 4096.0 2024-08-09 19:29:41,120 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2024-08-09 19:29:42,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=159580.0, ans=0.0 2024-08-09 19:29:43,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=159580.0, ans=0.0 2024-08-09 19:29:47,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=159580.0, ans=0.125 2024-08-09 19:30:01,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=159680.0, ans=0.2 2024-08-09 19:30:12,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=159780.0, ans=0.0 2024-08-09 19:30:31,877 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-09 19:30:32,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2024-08-09 19:30:34,405 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1500, loss[loss=0.1274, beats_loss=0.01143, ecapa_loss=0.0003282, whisper_loss=0.1127, over 18694.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01265, ecapa_loss=0.0003304, whisper_loss=0.1023, over 3842698.47 frames. ], batch size: 73, lr: 2.87e-02, grad_scale: 4096.0 2024-08-09 19:30:39,738 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.965e+01 3.414e+01 4.022e+01 6.981e+01, threshold=6.828e+01, percent-clipped=1.0 2024-08-09 19:30:51,664 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 13 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-09 19:30:52,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=160080.0, ans=0.04949747468305833 2024-08-09 19:30:53,913 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2024-08-09 19:30:57,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=160080.0, ans=0.025 2024-08-09 19:31:04,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=160180.0, ans=0.0 2024-08-09 19:31:13,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=160180.0, ans=0.125 2024-08-09 19:31:31,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=160280.0, ans=0.0 2024-08-09 19:31:36,563 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 19:31:42,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=160380.0, ans=0.0 2024-08-09 19:31:53,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=160480.0, ans=0.0 2024-08-09 19:31:54,044 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1550, loss[loss=0.101, beats_loss=0.01562, ecapa_loss=0.0002554, whisper_loss=0.08286, over 15349.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01272, ecapa_loss=0.0003279, whisper_loss=0.1018, over 3810273.92 frames. ], batch size: 61, lr: 2.87e-02, grad_scale: 8192.0 2024-08-09 19:31:54,210 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-09 19:32:20,572 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-09 19:32:20,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=160580.0, ans=0.1 2024-08-09 19:32:29,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=160680.0, ans=0.0 2024-08-09 19:32:32,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=160680.0, ans=0.1 2024-08-09 19:32:33,271 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-09 19:33:00,030 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 19:33:01,441 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-09 19:33:01,962 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=15.0 2024-08-09 19:33:12,165 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1600, loss[loss=0.1127, beats_loss=0.01282, ecapa_loss=0.0003047, whisper_loss=0.09682, over 21645.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01275, ecapa_loss=0.0003261, whisper_loss=0.102, over 3835210.54 frames. ], batch size: 86, lr: 2.87e-02, grad_scale: 8192.0 2024-08-09 19:33:16,147 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.968e+01 3.450e+01 4.320e+01 7.036e+01, threshold=6.900e+01, percent-clipped=1.0 2024-08-09 19:33:24,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=160980.0, ans=0.125 2024-08-09 19:33:55,253 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-09 19:34:13,854 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-09 19:34:26,045 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-09 19:34:30,003 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1650, loss[loss=0.09954, beats_loss=0.01422, ecapa_loss=0.0003091, whisper_loss=0.08223, over 15015.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01262, ecapa_loss=0.0003272, whisper_loss=0.1026, over 3834473.21 frames. ], batch size: 63, lr: 2.86e-02, grad_scale: 8192.0 2024-08-09 19:34:34,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=161480.0, ans=0.1 2024-08-09 19:34:35,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=161480.0, ans=0.0 2024-08-09 19:34:37,949 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-09 19:34:40,706 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-09 19:34:50,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=161580.0, ans=0.125 2024-08-09 19:35:06,440 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 31 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-09 19:35:08,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=161680.0, ans=0.05 2024-08-09 19:35:14,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=161780.0, ans=0.09899494936611666 2024-08-09 19:35:16,080 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 19:35:27,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=161780.0, ans=0.0 2024-08-09 19:35:42,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=161880.0, ans=0.0 2024-08-09 19:35:43,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=161880.0, ans=0.125 2024-08-09 19:35:45,569 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1700, loss[loss=0.1168, beats_loss=0.0121, ecapa_loss=0.0003017, whisper_loss=0.1016, over 17506.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01259, ecapa_loss=0.0003305, whisper_loss=0.1021, over 3820020.49 frames. ], batch size: 68, lr: 2.86e-02, grad_scale: 8192.0 2024-08-09 19:35:48,732 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.753e+01 3.153e+01 3.657e+01 6.641e+01, threshold=6.306e+01, percent-clipped=0.0 2024-08-09 19:35:54,179 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.94 vs. limit=15.0 2024-08-09 19:35:55,991 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.47 vs. limit=8.0 2024-08-09 19:35:58,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=161980.0, ans=0.0 2024-08-09 19:36:01,408 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.89 vs. limit=15.0 2024-08-09 19:36:14,772 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=28.46 vs. limit=22.5 2024-08-09 19:36:22,785 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 19:36:33,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=162280.0, ans=0.1 2024-08-09 19:36:43,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=162280.0, ans=0.125 2024-08-09 19:36:59,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=162480.0, ans=0.125 2024-08-09 19:36:59,866 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1750, loss[loss=0.1268, beats_loss=0.01093, ecapa_loss=0.000399, whisper_loss=0.1119, over 18744.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01264, ecapa_loss=0.0003293, whisper_loss=0.102, over 3801882.40 frames. ], batch size: 73, lr: 2.86e-02, grad_scale: 8192.0 2024-08-09 19:37:02,371 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.69 vs. limit=15.0 2024-08-09 19:37:05,816 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-09 19:37:07,696 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=8.148e+02 2024-08-09 19:37:22,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=162580.0, ans=0.125 2024-08-09 19:37:35,108 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 19:37:39,859 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.68 vs. limit=22.5 2024-08-09 19:37:43,313 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-09 19:38:16,214 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1800, loss[loss=0.1253, beats_loss=0.01291, ecapa_loss=0.0003533, whisper_loss=0.1088, over 21835.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01259, ecapa_loss=0.0003313, whisper_loss=0.1024, over 3814054.95 frames. ], batch size: 89, lr: 2.85e-02, grad_scale: 8192.0 2024-08-09 19:38:18,988 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.272e+01 2.809e+01 3.330e+01 3.752e+01 6.796e+01, threshold=6.661e+01, percent-clipped=1.0 2024-08-09 19:38:21,025 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.08 vs. limit=15.0 2024-08-09 19:38:22,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=162980.0, ans=0.125 2024-08-09 19:38:28,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=162980.0, ans=0.2 2024-08-09 19:38:31,133 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-09 19:38:49,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=163180.0, ans=0.09899494936611666 2024-08-09 19:39:09,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=163280.0, ans=0.125 2024-08-09 19:39:20,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=163380.0, ans=0.0 2024-08-09 19:39:23,180 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=3.441e-02 2024-08-09 19:39:31,151 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1850, loss[loss=0.1185, beats_loss=0.01029, ecapa_loss=0.000399, whisper_loss=0.1042, over 19417.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01262, ecapa_loss=0.0003286, whisper_loss=0.1031, over 3829735.38 frames. ], batch size: 76, lr: 2.85e-02, grad_scale: 8192.0 2024-08-09 19:39:34,092 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-09 19:39:46,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=163580.0, ans=0.1 2024-08-09 19:40:01,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=163680.0, ans=0.1 2024-08-09 19:40:10,244 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.15 vs. limit=12.0 2024-08-09 19:40:14,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=163780.0, ans=0.125 2024-08-09 19:40:42,902 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1900, loss[loss=0.1446, beats_loss=0.008233, ecapa_loss=0.0004328, whisper_loss=0.132, over 20723.00 frames. ], tot_loss[loss=0.1193, beats_loss=0.0126, ecapa_loss=0.0003385, whisper_loss=0.1034, over 3831746.85 frames. ], batch size: 82, lr: 2.85e-02, grad_scale: 8192.0 2024-08-09 19:40:45,629 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.888e+01 3.200e+01 3.675e+01 7.363e+01, threshold=6.401e+01, percent-clipped=1.0 2024-08-09 19:40:46,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=163980.0, ans=0.125 2024-08-09 19:40:56,382 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 19:41:00,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=164080.0, ans=0.0 2024-08-09 19:41:10,208 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-09 19:41:13,931 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 18 from LS+wenet, 27 from Vox, 46 fro AS 2024-08-09 19:41:16,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=164180.0, ans=0.125 2024-08-09 19:41:18,950 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-09 19:41:35,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=164380.0, ans=0.0 2024-08-09 19:41:38,750 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-09 19:41:49,402 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 1950, loss[loss=0.1006, beats_loss=0.01373, ecapa_loss=0.0003595, whisper_loss=0.08331, over 20997.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01262, ecapa_loss=0.000344, whisper_loss=0.1027, over 3832522.97 frames. ], batch size: 83, lr: 2.84e-02, grad_scale: 8192.0 2024-08-09 19:41:54,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=164480.0, ans=0.0 2024-08-09 19:42:19,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=164680.0, ans=0.125 2024-08-09 19:42:20,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=164680.0, ans=0.02 2024-08-09 19:42:35,152 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-09 19:42:36,384 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 33 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-09 19:42:52,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=164880.0, ans=0.0 2024-08-09 19:42:55,706 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2000, loss[loss=0.1538, beats_loss=0.01444, ecapa_loss=0.0003086, whisper_loss=0.1363, over 17602.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01275, ecapa_loss=0.0003487, whisper_loss=0.1017, over 3833130.42 frames. ], batch size: 64, lr: 2.84e-02, grad_scale: 8192.0 2024-08-09 19:42:56,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=164980.0, ans=15.0 2024-08-09 19:42:58,186 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.959e+01 3.174e+01 3.680e+01 5.777e+01, threshold=6.348e+01, percent-clipped=0.0 2024-08-09 19:43:05,974 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 19:43:20,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=165180.0, ans=0.2 2024-08-09 19:43:22,283 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.34 vs. limit=15.0 2024-08-09 19:43:24,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=165180.0, ans=0.125 2024-08-09 19:43:26,835 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 19:43:40,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=165280.0, ans=0.1 2024-08-09 19:43:57,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=165380.0, ans=0.125 2024-08-09 19:44:01,612 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2050, loss[loss=0.122, beats_loss=0.01215, ecapa_loss=0.0004761, whisper_loss=0.105, over 22113.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01284, ecapa_loss=0.0003519, whisper_loss=0.1014, over 3820777.28 frames. ], batch size: 91, lr: 2.84e-02, grad_scale: 8192.0 2024-08-09 19:44:20,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=165580.0, ans=0.0 2024-08-09 19:44:36,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=165680.0, ans=0.1 2024-08-09 19:44:47,908 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=15.0 2024-08-09 19:45:04,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=165880.0, ans=0.0 2024-08-09 19:45:06,777 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2100, loss[loss=0.139, beats_loss=0.01104, ecapa_loss=0.0004, whisper_loss=0.1239, over 21249.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01279, ecapa_loss=0.0003522, whisper_loss=0.1019, over 3833264.22 frames. ], batch size: 85, lr: 2.83e-02, grad_scale: 8192.0 2024-08-09 19:45:09,409 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 2.923e+01 3.262e+01 4.036e+01 6.421e+01, threshold=6.525e+01, percent-clipped=1.0 2024-08-09 19:45:14,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=165980.0, ans=0.0 2024-08-09 19:45:15,087 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-09 19:45:19,616 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2024-08-09 19:45:29,616 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 19:45:33,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=166180.0, ans=0.0 2024-08-09 19:45:54,834 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=12.0 2024-08-09 19:45:58,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=166380.0, ans=0.1 2024-08-09 19:46:12,755 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2150, loss[loss=0.0829, beats_loss=0.02033, ecapa_loss=0.0003809, whisper_loss=0.05876, over 20799.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01281, ecapa_loss=0.0003533, whisper_loss=0.1022, over 3855755.95 frames. ], batch size: 93, lr: 2.83e-02, grad_scale: 8192.0 2024-08-09 19:46:16,769 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-09 19:46:48,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=166680.0, ans=0.125 2024-08-09 19:46:49,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=166680.0, ans=0.09899494936611666 2024-08-09 19:46:50,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=166780.0, ans=0.125 2024-08-09 19:47:12,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=166880.0, ans=0.0 2024-08-09 19:47:18,239 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2200, loss[loss=0.1438, beats_loss=0.01137, ecapa_loss=0.0003158, whisper_loss=0.1292, over 20503.00 frames. ], tot_loss[loss=0.1192, beats_loss=0.01277, ecapa_loss=0.0003528, whisper_loss=0.1029, over 3849672.95 frames. ], batch size: 77, lr: 2.82e-02, grad_scale: 8192.0 2024-08-09 19:47:18,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=166980.0, ans=0.0 2024-08-09 19:47:21,058 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.170e+01 2.890e+01 3.143e+01 3.810e+01 5.998e+01, threshold=6.286e+01, percent-clipped=0.0 2024-08-09 19:47:25,100 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-09 19:47:43,386 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 32 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-09 19:47:58,503 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.02 vs. limit=12.0 2024-08-09 19:48:10,339 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.96 vs. limit=15.0 2024-08-09 19:48:15,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=167380.0, ans=0.125 2024-08-09 19:48:15,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=167380.0, ans=0.125 2024-08-09 19:48:16,366 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-09 19:48:23,889 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2250, loss[loss=0.1137, beats_loss=0.01383, ecapa_loss=0.0003129, whisper_loss=0.09675, over 19283.00 frames. ], tot_loss[loss=0.1196, beats_loss=0.01286, ecapa_loss=0.0003563, whisper_loss=0.1031, over 3900415.98 frames. ], batch size: 75, lr: 2.82e-02, grad_scale: 8192.0 2024-08-09 19:48:26,448 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 19:48:29,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=167480.0, ans=0.0 2024-08-09 19:48:30,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=167480.0, ans=0.0 2024-08-09 19:48:31,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=167480.0, ans=0.0 2024-08-09 19:48:33,466 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.38 vs. limit=22.5 2024-08-09 19:48:34,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=167480.0, ans=0.125 2024-08-09 19:48:43,897 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2024-08-09 19:48:46,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=167580.0, ans=0.2 2024-08-09 19:48:50,884 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 28 from Vox, 20 fro AS 2024-08-09 19:49:07,972 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-09 19:49:28,360 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2300, loss[loss=0.1204, beats_loss=0.01345, ecapa_loss=0.0003329, whisper_loss=0.1036, over 22635.00 frames. ], tot_loss[loss=0.1197, beats_loss=0.01291, ecapa_loss=0.0003532, whisper_loss=0.1032, over 3924622.15 frames. ], batch size: 92, lr: 2.82e-02, grad_scale: 8192.0 2024-08-09 19:49:31,217 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 3.098e+01 3.355e+01 3.897e+01 6.798e+01, threshold=6.710e+01, percent-clipped=2.0 2024-08-09 19:49:31,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=167980.0, ans=0.0 2024-08-09 19:49:39,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=167980.0, ans=0.125 2024-08-09 19:49:47,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=168080.0, ans=0.0 2024-08-09 19:49:56,039 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 24 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-09 19:50:05,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=168180.0, ans=0.2 2024-08-09 19:50:15,898 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 19:50:19,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=168280.0, ans=0.2 2024-08-09 19:50:23,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=168380.0, ans=0.95 2024-08-09 19:50:34,834 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2350, loss[loss=0.1241, beats_loss=0.0135, ecapa_loss=0.0003787, whisper_loss=0.1068, over 19108.00 frames. ], tot_loss[loss=0.1191, beats_loss=0.01293, ecapa_loss=0.0003549, whisper_loss=0.1026, over 3903802.78 frames. ], batch size: 79, lr: 2.81e-02, grad_scale: 8192.0 2024-08-09 19:50:41,494 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-09 19:50:45,859 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 19:50:48,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=168580.0, ans=0.0 2024-08-09 19:50:58,716 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 19:51:01,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=168680.0, ans=0.125 2024-08-09 19:51:06,855 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 19:51:12,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=168680.0, ans=0.1 2024-08-09 19:51:12,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=168680.0, ans=0.125 2024-08-09 19:51:19,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=168780.0, ans=0.125 2024-08-09 19:51:25,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=168780.0, ans=0.125 2024-08-09 19:51:26,328 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 19:51:29,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=168880.0, ans=0.125 2024-08-09 19:51:34,767 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 19:51:36,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=168880.0, ans=0.0 2024-08-09 19:51:38,172 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-09 19:51:42,309 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-09 19:51:43,350 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2400, loss[loss=0.1179, beats_loss=0.01119, ecapa_loss=0.0003176, whisper_loss=0.1035, over 15794.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01287, ecapa_loss=0.0003537, whisper_loss=0.1023, over 3891054.79 frames. ], batch size: 59, lr: 2.81e-02, grad_scale: 8192.0 2024-08-09 19:51:46,039 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 2.941e+01 3.344e+01 3.819e+01 6.517e+01, threshold=6.689e+01, percent-clipped=0.0 2024-08-09 19:51:47,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=168980.0, ans=0.0 2024-08-09 19:51:50,046 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-09 19:52:06,842 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-09 19:52:08,192 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-09 19:52:24,909 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-09 19:52:47,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=169380.0, ans=0.125 2024-08-09 19:52:50,787 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2450, loss[loss=0.1186, beats_loss=0.01053, ecapa_loss=0.0004065, whisper_loss=0.104, over 16313.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01284, ecapa_loss=0.000351, whisper_loss=0.1024, over 3910220.90 frames. ], batch size: 67, lr: 2.81e-02, grad_scale: 8192.0 2024-08-09 19:52:52,480 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-09 19:53:14,585 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 19:53:17,942 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2024-08-09 19:53:19,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=169680.0, ans=0.95 2024-08-09 19:53:20,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=169680.0, ans=0.0 2024-08-09 19:53:53,987 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 19:54:00,406 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2500, loss[loss=0.1051, beats_loss=0.01432, ecapa_loss=0.0003667, whisper_loss=0.0871, over 20317.00 frames. ], tot_loss[loss=0.1195, beats_loss=0.01279, ecapa_loss=0.0003509, whisper_loss=0.1032, over 3920466.11 frames. ], batch size: 85, lr: 2.80e-02, grad_scale: 8192.0 2024-08-09 19:54:03,062 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.848e+01 3.405e+01 3.928e+01 5.880e+01, threshold=6.809e+01, percent-clipped=0.0 2024-08-09 19:54:05,777 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=24.50 vs. limit=22.5 2024-08-09 19:54:10,006 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 19:54:15,971 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2024-08-09 19:54:16,807 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-09 19:54:19,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=170080.0, ans=0.07 2024-08-09 19:54:35,317 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 23 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-09 19:54:39,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=170180.0, ans=0.0 2024-08-09 19:54:56,566 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-09 19:55:07,220 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.442e+00 2024-08-09 19:55:12,204 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2550, loss[loss=0.1005, beats_loss=0.01502, ecapa_loss=0.0003119, whisper_loss=0.08236, over 23205.00 frames. ], tot_loss[loss=0.1192, beats_loss=0.01281, ecapa_loss=0.000349, whisper_loss=0.1029, over 3877527.29 frames. ], batch size: 94, lr: 2.80e-02, grad_scale: 8192.0 2024-08-09 19:55:15,713 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-09 19:55:24,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170480.0, ans=0.1 2024-08-09 19:55:29,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=170580.0, ans=0.125 2024-08-09 19:55:40,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=170580.0, ans=0.0 2024-08-09 19:55:52,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=170680.0, ans=0.125 2024-08-09 19:56:05,295 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-09 19:56:11,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=170880.0, ans=0.125 2024-08-09 19:56:11,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=170880.0, ans=0.125 2024-08-09 19:56:18,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170880.0, ans=0.1 2024-08-09 19:56:25,372 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-09 19:56:26,446 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2600, loss[loss=0.1084, beats_loss=0.01447, ecapa_loss=0.0003147, whisper_loss=0.09075, over 17008.00 frames. ], tot_loss[loss=0.1195, beats_loss=0.01283, ecapa_loss=0.0003491, whisper_loss=0.1032, over 3891853.78 frames. ], batch size: 65, lr: 2.80e-02, grad_scale: 8192.0 2024-08-09 19:56:29,276 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.466e+01 3.011e+01 3.512e+01 4.102e+01 7.361e+01, threshold=7.024e+01, percent-clipped=2.0 2024-08-09 19:56:44,837 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 14 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-09 19:56:50,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=171080.0, ans=0.125 2024-08-09 19:57:00,074 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-09 19:57:02,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=171180.0, ans=0.0 2024-08-09 19:57:07,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=171280.0, ans=0.09899494936611666 2024-08-09 19:57:14,415 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 19:57:30,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=171380.0, ans=0.125 2024-08-09 19:57:36,877 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2650, loss[loss=0.1119, beats_loss=0.01187, ecapa_loss=0.000405, whisper_loss=0.09596, over 21390.00 frames. ], tot_loss[loss=0.1189, beats_loss=0.01283, ecapa_loss=0.0003515, whisper_loss=0.1025, over 3881655.45 frames. ], batch size: 84, lr: 2.79e-02, grad_scale: 8192.0 2024-08-09 19:57:42,951 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-09 19:57:47,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=171480.0, ans=0.125 2024-08-09 19:57:49,000 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=12.0 2024-08-09 19:58:11,680 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-09 19:58:20,177 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-09 19:58:34,447 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-09 19:58:40,523 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=15.0 2024-08-09 19:58:45,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=171880.0, ans=0.125 2024-08-09 19:58:48,250 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2700, loss[loss=0.1307, beats_loss=0.01171, ecapa_loss=0.0004205, whisper_loss=0.1147, over 13546.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.01287, ecapa_loss=0.0003504, whisper_loss=0.102, over 3868436.91 frames. ], batch size: 57, lr: 2.79e-02, grad_scale: 8192.0 2024-08-09 19:58:51,071 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 2.909e+01 3.335e+01 3.725e+01 7.583e+01, threshold=6.671e+01, percent-clipped=1.0 2024-08-09 19:59:02,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=172080.0, ans=0.125 2024-08-09 19:59:04,156 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2024-08-09 19:59:04,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=172080.0, ans=0.0 2024-08-09 19:59:12,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=172080.0, ans=0.125 2024-08-09 19:59:59,011 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2750, loss[loss=0.1251, beats_loss=0.01212, ecapa_loss=0.0003897, whisper_loss=0.1091, over 22061.00 frames. ], tot_loss[loss=0.1189, beats_loss=0.01282, ecapa_loss=0.0003496, whisper_loss=0.1026, over 3891707.44 frames. ], batch size: 90, lr: 2.79e-02, grad_scale: 8192.0 2024-08-09 20:00:01,418 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2024-08-09 20:00:04,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=172480.0, ans=0.0 2024-08-09 20:00:18,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=172580.0, ans=0.0 2024-08-09 20:00:43,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=172780.0, ans=0.125 2024-08-09 20:00:52,324 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.83 vs. limit=15.0 2024-08-09 20:01:04,061 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-09 20:01:12,578 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2800, loss[loss=0.1224, beats_loss=0.0119, ecapa_loss=0.0004346, whisper_loss=0.1062, over 18485.00 frames. ], tot_loss[loss=0.1193, beats_loss=0.01281, ecapa_loss=0.0003478, whisper_loss=0.103, over 3912994.01 frames. ], batch size: 77, lr: 2.78e-02, grad_scale: 8192.0 2024-08-09 20:01:15,189 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 3.001e+01 3.485e+01 3.958e+01 7.033e+01, threshold=6.969e+01, percent-clipped=2.0 2024-08-09 20:01:17,244 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.25 vs. limit=22.5 2024-08-09 20:01:19,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=172980.0, ans=0.125 2024-08-09 20:01:22,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=172980.0, ans=0.1 2024-08-09 20:01:26,031 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.98 vs. limit=15.0 2024-08-09 20:01:39,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=173180.0, ans=0.0 2024-08-09 20:01:40,163 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.58 vs. limit=15.0 2024-08-09 20:01:54,088 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 20:01:56,588 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-09 20:02:10,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=173380.0, ans=0.2 2024-08-09 20:02:23,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=173480.0, ans=0.2 2024-08-09 20:02:24,015 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2850, loss[loss=0.1169, beats_loss=0.01019, ecapa_loss=0.0003177, whisper_loss=0.1035, over 19154.00 frames. ], tot_loss[loss=0.1195, beats_loss=0.01284, ecapa_loss=0.0003453, whisper_loss=0.1032, over 3918445.50 frames. ], batch size: 69, lr: 2.78e-02, grad_scale: 8192.0 2024-08-09 20:02:45,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=173580.0, ans=0.0 2024-08-09 20:03:17,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=173780.0, ans=0.2 2024-08-09 20:03:26,294 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-09 20:03:30,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=173880.0, ans=0.09899494936611666 2024-08-09 20:03:36,877 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2900, loss[loss=0.1089, beats_loss=0.01456, ecapa_loss=0.0003641, whisper_loss=0.09067, over 16050.00 frames. ], tot_loss[loss=0.1194, beats_loss=0.01291, ecapa_loss=0.0003472, whisper_loss=0.103, over 3944086.68 frames. ], batch size: 66, lr: 2.78e-02, grad_scale: 8192.0 2024-08-09 20:03:40,014 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 3.065e+01 3.431e+01 3.879e+01 6.098e+01, threshold=6.862e+01, percent-clipped=0.0 2024-08-09 20:03:42,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=173980.0, ans=0.1 2024-08-09 20:03:46,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=173980.0, ans=0.125 2024-08-09 20:03:49,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=173980.0, ans=0.1 2024-08-09 20:04:05,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=174180.0, ans=0.125 2024-08-09 20:04:15,118 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.57 vs. limit=22.5 2024-08-09 20:04:16,748 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=27.84 vs. limit=22.5 2024-08-09 20:04:21,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=174280.0, ans=0.0 2024-08-09 20:04:29,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=174280.0, ans=0.125 2024-08-09 20:04:30,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.93 vs. limit=22.5 2024-08-09 20:04:35,666 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-09 20:04:47,887 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 2950, loss[loss=0.1073, beats_loss=0.01313, ecapa_loss=0.0003991, whisper_loss=0.0902, over 21549.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01297, ecapa_loss=0.0003489, whisper_loss=0.1023, over 3923410.24 frames. ], batch size: 89, lr: 2.77e-02, grad_scale: 8192.0 2024-08-09 20:04:48,125 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-09 20:04:57,508 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2024-08-09 20:05:02,750 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2024-08-09 20:05:19,503 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-09 20:05:29,158 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=15.0 2024-08-09 20:05:30,903 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.88 vs. limit=22.5 2024-08-09 20:05:48,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=174780.0, ans=0.125 2024-08-09 20:05:49,441 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-09 20:06:04,367 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-09 20:06:14,462 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3000, loss[loss=0.09515, beats_loss=0.01363, ecapa_loss=0.0003103, whisper_loss=0.07842, over 17835.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01302, ecapa_loss=0.0003477, whisper_loss=0.1014, over 3950483.26 frames. ], batch size: 71, lr: 2.77e-02, grad_scale: 8192.0 2024-08-09 20:06:14,463 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-09 20:06:58,484 INFO [train_multi_KD3.py:1149] (2/4) Epoch 2, validation on ASR_libri: loss=0.2837, beats_loss=0, ecapa_loss=0.001014, whisper_loss=0.2736, over 922467.00 frames. 2024-08-09 20:07:17,261 INFO [train_multi_KD3.py:1149] (2/4) Epoch 2, validation on SV_voxceleb1: loss=0.009278, beats_loss=0, ecapa_loss=0.0009278, whisper_loss=0, over 939242.00 frames. 2024-08-09 20:08:48,759 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.3250, 1.2991, 1.9162, 1.7364], device='cuda:2') 2024-08-09 20:08:50,889 INFO [train_multi_KD3.py:1149] (2/4) Epoch 2, validation on AT_audioset: loss=0.03024, beats_loss=0.03024, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 20:08:50,894 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-09 20:08:53,414 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+01 2.977e+01 3.430e+01 4.027e+01 7.550e+01, threshold=6.860e+01, percent-clipped=3.0 2024-08-09 20:09:24,066 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-09 20:09:24,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=175180.0, ans=0.1 2024-08-09 20:09:31,120 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-09 20:10:16,172 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.97 vs. limit=10.0 2024-08-09 20:10:28,844 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3050, loss[loss=0.132, beats_loss=0.01329, ecapa_loss=0.0003351, whisper_loss=0.1153, over 21999.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01294, ecapa_loss=0.0003495, whisper_loss=0.1024, over 3927997.08 frames. ], batch size: 85, lr: 2.77e-02, grad_scale: 8192.0 2024-08-09 20:10:36,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=175480.0, ans=0.0 2024-08-09 20:10:50,200 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-09 20:11:00,914 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-09 20:11:20,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=175680.0, ans=0.1 2024-08-09 20:11:57,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.12 vs. limit=15.0 2024-08-09 20:12:05,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=175880.0, ans=0.0 2024-08-09 20:12:08,051 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-09 20:12:15,107 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-09 20:12:21,858 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3100, loss[loss=0.1434, beats_loss=0.009996, ecapa_loss=0.0004091, whisper_loss=0.1293, over 22859.00 frames. ], tot_loss[loss=0.1192, beats_loss=0.01289, ecapa_loss=0.0003509, whisper_loss=0.1028, over 3934423.32 frames. ], batch size: 92, lr: 2.76e-02, grad_scale: 8192.0 2024-08-09 20:12:25,168 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 3.112e+01 3.600e+01 4.119e+01 8.540e+01, threshold=7.200e+01, percent-clipped=4.0 2024-08-09 20:12:28,439 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 20:12:42,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=176080.0, ans=0.07 2024-08-09 20:12:50,156 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 20:13:08,808 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 20:13:34,095 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 10 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-09 20:13:37,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=176280.0, ans=0.125 2024-08-09 20:14:08,171 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3150, loss[loss=0.1186, beats_loss=0.008996, ecapa_loss=0.0003852, whisper_loss=0.1058, over 14436.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01285, ecapa_loss=0.0003501, whisper_loss=0.1025, over 3904046.62 frames. ], batch size: 54, lr: 2.76e-02, grad_scale: 8192.0 2024-08-09 20:14:10,722 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-09 20:14:27,560 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-09 20:14:33,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=176580.0, ans=0.125 2024-08-09 20:14:38,386 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-09 20:15:18,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=176780.0, ans=0.07 2024-08-09 20:15:34,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=176880.0, ans=0.1 2024-08-09 20:15:36,247 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.56 vs. limit=6.0 2024-08-09 20:15:44,770 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2024-08-09 20:15:46,480 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3200, loss[loss=0.1169, beats_loss=0.01216, ecapa_loss=0.0004652, whisper_loss=0.1001, over 13117.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.0129, ecapa_loss=0.0003511, whisper_loss=0.1021, over 3857917.22 frames. ], batch size: 55, lr: 2.76e-02, grad_scale: 8192.0 2024-08-09 20:15:49,063 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.848e+01 3.292e+01 3.822e+01 6.429e+01, threshold=6.585e+01, percent-clipped=0.0 2024-08-09 20:16:06,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=177080.0, ans=0.1 2024-08-09 20:16:20,081 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-09 20:16:40,608 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 20:16:56,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=177380.0, ans=0.125 2024-08-09 20:17:00,897 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3250, loss[loss=0.1153, beats_loss=0.01223, ecapa_loss=0.0003093, whisper_loss=0.09994, over 17614.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01281, ecapa_loss=0.0003507, whisper_loss=0.1018, over 3855604.29 frames. ], batch size: 69, lr: 2.75e-02, grad_scale: 8192.0 2024-08-09 20:17:15,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=177580.0, ans=6.0 2024-08-09 20:17:16,669 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2024-08-09 20:17:17,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=177580.0, ans=0.0 2024-08-09 20:17:29,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=177680.0, ans=0.125 2024-08-09 20:17:29,861 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2024-08-09 20:17:31,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=177680.0, ans=0.125 2024-08-09 20:17:41,932 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.55 vs. limit=15.0 2024-08-09 20:17:42,098 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.57 vs. limit=15.0 2024-08-09 20:17:56,280 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 20:18:14,342 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3300, loss[loss=0.1005, beats_loss=0.01415, ecapa_loss=0.0003243, whisper_loss=0.08315, over 14982.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01282, ecapa_loss=0.0003514, whisper_loss=0.1012, over 3859827.96 frames. ], batch size: 60, lr: 2.75e-02, grad_scale: 8192.0 2024-08-09 20:18:18,072 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 3.080e+01 3.504e+01 4.263e+01 7.840e+01, threshold=7.009e+01, percent-clipped=4.0 2024-08-09 20:18:30,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=178080.0, ans=0.05 2024-08-09 20:18:52,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=178180.0, ans=0.125 2024-08-09 20:19:02,627 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 21 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-09 20:19:08,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=178280.0, ans=0.1 2024-08-09 20:19:10,902 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-09 20:19:24,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=178380.0, ans=0.125 2024-08-09 20:19:24,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=178380.0, ans=0.0 2024-08-09 20:19:36,089 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3350, loss[loss=0.1267, beats_loss=0.01155, ecapa_loss=0.0003927, whisper_loss=0.1113, over 19206.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01283, ecapa_loss=0.0003539, whisper_loss=0.1013, over 3886723.97 frames. ], batch size: 76, lr: 2.75e-02, grad_scale: 8192.0 2024-08-09 20:19:54,312 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-09 20:20:09,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=178680.0, ans=0.025 2024-08-09 20:20:14,472 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.613e+00 2024-08-09 20:20:14,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=178680.0, ans=0.0 2024-08-09 20:20:25,153 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-09 20:20:26,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=178780.0, ans=0.0 2024-08-09 20:20:31,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=178780.0, ans=0.0 2024-08-09 20:20:31,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=178780.0, ans=0.0 2024-08-09 20:20:45,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=178880.0, ans=0.0 2024-08-09 20:20:58,058 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3400, loss[loss=0.1184, beats_loss=0.01059, ecapa_loss=0.0003222, whisper_loss=0.1046, over 20612.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01283, ecapa_loss=0.0003522, whisper_loss=0.1012, over 3877729.06 frames. ], batch size: 78, lr: 2.74e-02, grad_scale: 8192.0 2024-08-09 20:20:58,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=178980.0, ans=0.0 2024-08-09 20:21:00,527 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.357e+01 2.994e+01 3.327e+01 4.294e+01 6.950e+01, threshold=6.654e+01, percent-clipped=0.0 2024-08-09 20:21:00,663 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 20:21:20,588 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.68 vs. limit=15.0 2024-08-09 20:21:24,696 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2024-08-09 20:21:38,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=179180.0, ans=0.125 2024-08-09 20:22:14,122 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.23 vs. limit=22.5 2024-08-09 20:22:21,731 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3450, loss[loss=0.1282, beats_loss=0.01213, ecapa_loss=0.000281, whisper_loss=0.1133, over 23826.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01289, ecapa_loss=0.0003527, whisper_loss=0.1005, over 3874184.99 frames. ], batch size: 92, lr: 2.74e-02, grad_scale: 8192.0 2024-08-09 20:22:32,742 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 13 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 20:22:45,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=179580.0, ans=0.125 2024-08-09 20:23:20,712 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 16 from Vox, 50 fro AS 2024-08-09 20:23:31,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=179880.0, ans=0.0 2024-08-09 20:23:34,655 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-09 20:23:43,968 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3500, loss[loss=0.1233, beats_loss=0.01287, ecapa_loss=0.0003111, whisper_loss=0.1073, over 22510.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01285, ecapa_loss=0.0003505, whisper_loss=0.1005, over 3869904.02 frames. ], batch size: 87, lr: 2.74e-02, grad_scale: 8192.0 2024-08-09 20:23:47,153 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.966e+01 3.324e+01 3.987e+01 6.193e+01, threshold=6.648e+01, percent-clipped=0.0 2024-08-09 20:23:59,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=180080.0, ans=0.0 2024-08-09 20:24:08,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=180080.0, ans=0.0 2024-08-09 20:24:16,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=180180.0, ans=0.09899494936611666 2024-08-09 20:24:20,640 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-09 20:24:32,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=180280.0, ans=0.125 2024-08-09 20:24:57,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=180380.0, ans=0.0 2024-08-09 20:25:03,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=180380.0, ans=0.2 2024-08-09 20:25:08,170 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3550, loss[loss=0.104, beats_loss=0.0127, ecapa_loss=0.000358, whisper_loss=0.08776, over 19562.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01286, ecapa_loss=0.0003483, whisper_loss=0.1, over 3878407.88 frames. ], batch size: 78, lr: 2.73e-02, grad_scale: 16384.0 2024-08-09 20:25:11,712 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-09 20:25:25,182 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-09 20:25:31,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=180580.0, ans=0.125 2024-08-09 20:25:36,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=180580.0, ans=0.0 2024-08-09 20:25:44,379 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 20:25:46,692 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2024-08-09 20:25:49,970 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.96 vs. limit=22.5 2024-08-09 20:25:56,858 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-09 20:26:12,433 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-09 20:26:35,296 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3600, loss[loss=0.1157, beats_loss=0.0154, ecapa_loss=0.000257, whisper_loss=0.09774, over 23333.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01277, ecapa_loss=0.0003484, whisper_loss=0.1001, over 3890386.43 frames. ], batch size: 92, lr: 2.73e-02, grad_scale: 16384.0 2024-08-09 20:26:36,206 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.68 vs. limit=22.5 2024-08-09 20:26:38,495 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.391e+01 2.970e+01 3.508e+01 4.140e+01 6.583e+01, threshold=7.015e+01, percent-clipped=0.0 2024-08-09 20:26:40,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=180980.0, ans=0.0 2024-08-09 20:26:40,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=180980.0, ans=0.125 2024-08-09 20:26:47,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=180980.0, ans=0.125 2024-08-09 20:26:53,399 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-09 20:27:04,737 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.98 vs. limit=22.5 2024-08-09 20:27:21,543 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-09 20:27:31,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=181280.0, ans=0.125 2024-08-09 20:27:36,182 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 20:27:38,642 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 29 from LS+wenet, 34 from Vox, 32 fro AS 2024-08-09 20:27:38,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=181380.0, ans=0.1 2024-08-09 20:27:56,688 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3650, loss[loss=0.1199, beats_loss=0.01272, ecapa_loss=0.0004556, whisper_loss=0.1026, over 20301.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01282, ecapa_loss=0.0003484, whisper_loss=0.1006, over 3896787.02 frames. ], batch size: 90, lr: 2.73e-02, grad_scale: 16384.0 2024-08-09 20:28:00,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=181480.0, ans=0.0 2024-08-09 20:28:02,942 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 37 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-09 20:28:20,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=181580.0, ans=0.0 2024-08-09 20:28:55,941 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-09 20:29:19,137 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3700, loss[loss=0.09703, beats_loss=0.01412, ecapa_loss=0.0002639, whisper_loss=0.08028, over 15994.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01283, ecapa_loss=0.0003487, whisper_loss=0.101, over 3885938.94 frames. ], batch size: 59, lr: 2.72e-02, grad_scale: 16384.0 2024-08-09 20:29:22,367 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.937e+01 3.354e+01 4.017e+01 7.791e+01, threshold=6.707e+01, percent-clipped=1.0 2024-08-09 20:29:32,376 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.41 vs. limit=15.0 2024-08-09 20:29:34,125 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.02 vs. limit=15.0 2024-08-09 20:29:59,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=182180.0, ans=0.125 2024-08-09 20:30:32,225 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.01 vs. limit=15.0 2024-08-09 20:30:39,386 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3750, loss[loss=0.1112, beats_loss=0.01358, ecapa_loss=0.0003214, whisper_loss=0.0944, over 23068.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01286, ecapa_loss=0.0003464, whisper_loss=0.1008, over 3881028.37 frames. ], batch size: 91, lr: 2.72e-02, grad_scale: 16384.0 2024-08-09 20:30:41,521 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.09 vs. limit=15.0 2024-08-09 20:30:42,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=182480.0, ans=0.0 2024-08-09 20:30:44,931 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.96 vs. limit=22.5 2024-08-09 20:30:57,779 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 20 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-09 20:31:01,949 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.84 vs. limit=22.5 2024-08-09 20:31:03,166 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2024-08-09 20:31:21,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=182680.0, ans=0.0 2024-08-09 20:31:59,372 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3800, loss[loss=0.07953, beats_loss=0.01377, ecapa_loss=0.0004578, whisper_loss=0.06118, over 11798.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01293, ecapa_loss=0.0003487, whisper_loss=0.1005, over 3889891.42 frames. ], batch size: 54, lr: 2.72e-02, grad_scale: 16384.0 2024-08-09 20:31:59,574 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-09 20:32:01,764 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 2.977e+01 3.395e+01 3.964e+01 6.825e+01, threshold=6.789e+01, percent-clipped=1.0 2024-08-09 20:32:32,248 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.81 vs. limit=22.5 2024-08-09 20:32:41,748 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-09 20:32:41,993 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 20:32:48,526 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.91 vs. limit=15.0 2024-08-09 20:33:10,574 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.47 vs. limit=22.5 2024-08-09 20:33:16,040 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3850, loss[loss=0.1209, beats_loss=0.01396, ecapa_loss=0.000372, whisper_loss=0.1032, over 22235.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01302, ecapa_loss=0.0003488, whisper_loss=0.1005, over 3881534.99 frames. ], batch size: 89, lr: 2.71e-02, grad_scale: 16384.0 2024-08-09 20:33:18,864 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-09 20:33:28,369 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 14 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-09 20:33:31,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=183580.0, ans=0.0 2024-08-09 20:33:54,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.91 vs. limit=22.5 2024-08-09 20:33:55,303 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 20:34:33,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=183980.0, ans=0.0 2024-08-09 20:34:35,207 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3900, loss[loss=0.121, beats_loss=0.01226, ecapa_loss=0.0002824, whisper_loss=0.1059, over 14968.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01293, ecapa_loss=0.0003503, whisper_loss=0.1013, over 3883592.86 frames. ], batch size: 57, lr: 2.71e-02, grad_scale: 16384.0 2024-08-09 20:34:38,593 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+01 2.932e+01 3.278e+01 3.846e+01 7.989e+01, threshold=6.556e+01, percent-clipped=2.0 2024-08-09 20:34:49,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=183980.0, ans=0.0 2024-08-09 20:35:11,324 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-09 20:35:19,513 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-09 20:35:22,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=184280.0, ans=0.1 2024-08-09 20:35:24,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=184280.0, ans=0.125 2024-08-09 20:35:30,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=184280.0, ans=0.0 2024-08-09 20:35:35,423 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-09 20:35:35,969 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2024-08-09 20:35:41,487 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2024-08-09 20:35:56,652 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 3950, loss[loss=0.08869, beats_loss=0.01713, ecapa_loss=0.0003275, whisper_loss=0.06828, over 17938.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01281, ecapa_loss=0.0003502, whisper_loss=0.1018, over 3873871.87 frames. ], batch size: 75, lr: 2.71e-02, grad_scale: 16384.0 2024-08-09 20:36:10,003 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-09 20:36:11,385 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-09 20:36:16,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=184580.0, ans=0.125 2024-08-09 20:36:20,174 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.42 vs. limit=15.0 2024-08-09 20:36:21,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=184580.0, ans=0.0 2024-08-09 20:36:35,909 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-09 20:36:40,011 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.84 vs. limit=6.0 2024-08-09 20:36:40,887 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-09 20:36:44,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=184780.0, ans=0.2 2024-08-09 20:36:47,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=184780.0, ans=0.125 2024-08-09 20:36:53,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=184780.0, ans=0.5 2024-08-09 20:37:06,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=184880.0, ans=0.125 2024-08-09 20:37:14,771 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4000, loss[loss=0.1165, beats_loss=0.01166, ecapa_loss=0.000395, whisper_loss=0.1009, over 19974.00 frames. ], tot_loss[loss=0.1192, beats_loss=0.0127, ecapa_loss=0.0003515, whisper_loss=0.1029, over 3865037.45 frames. ], batch size: 83, lr: 2.70e-02, grad_scale: 16384.0 2024-08-09 20:37:17,856 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.368e+01 2.965e+01 3.379e+01 3.827e+01 6.548e+01, threshold=6.758e+01, percent-clipped=0.0 2024-08-09 20:37:20,705 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 16 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 20:37:25,872 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-09 20:37:30,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=185080.0, ans=0.035 2024-08-09 20:37:30,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=185080.0, ans=0.0 2024-08-09 20:37:35,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=185080.0, ans=0.125 2024-08-09 20:37:41,703 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 15 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-09 20:37:50,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=185180.0, ans=0.1 2024-08-09 20:38:12,395 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=12.0 2024-08-09 20:38:21,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=185380.0, ans=0.125 2024-08-09 20:38:21,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=185380.0, ans=0.0 2024-08-09 20:38:21,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=185380.0, ans=0.0 2024-08-09 20:38:30,371 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4050, loss[loss=0.1248, beats_loss=0.01443, ecapa_loss=0.0002974, whisper_loss=0.1074, over 22050.00 frames. ], tot_loss[loss=0.1191, beats_loss=0.01266, ecapa_loss=0.0003519, whisper_loss=0.103, over 3852679.94 frames. ], batch size: 87, lr: 2.70e-02, grad_scale: 16384.0 2024-08-09 20:38:54,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=185580.0, ans=0.0 2024-08-09 20:39:01,762 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.99 vs. limit=22.5 2024-08-09 20:39:10,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=185680.0, ans=10.0 2024-08-09 20:39:20,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=185780.0, ans=0.125 2024-08-09 20:39:23,224 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.86 vs. limit=22.5 2024-08-09 20:39:37,030 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-09 20:39:39,472 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4100, loss[loss=0.1025, beats_loss=0.01296, ecapa_loss=0.0004046, whisper_loss=0.08547, over 21881.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01267, ecapa_loss=0.000351, whisper_loss=0.1028, over 3906975.67 frames. ], batch size: 94, lr: 2.70e-02, grad_scale: 16384.0 2024-08-09 20:39:42,212 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.357e+01 3.015e+01 3.336e+01 4.132e+01 1.372e+02, threshold=6.672e+01, percent-clipped=1.0 2024-08-09 20:39:46,192 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 24 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-09 20:39:49,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=185980.0, ans=0.125 2024-08-09 20:40:01,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=186080.0, ans=0.0 2024-08-09 20:40:02,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=186080.0, ans=0.0 2024-08-09 20:40:14,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=186180.0, ans=0.125 2024-08-09 20:40:30,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2024-08-09 20:40:38,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=186380.0, ans=0.125 2024-08-09 20:40:45,990 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4150, loss[loss=0.1061, beats_loss=0.01512, ecapa_loss=0.0003401, whisper_loss=0.0876, over 18928.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01274, ecapa_loss=0.000348, whisper_loss=0.1026, over 3931891.75 frames. ], batch size: 79, lr: 2.70e-02, grad_scale: 16384.0 2024-08-09 20:41:02,044 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-09 20:41:11,704 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-09 20:41:16,769 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-09 20:41:17,172 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 20:41:18,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=186680.0, ans=0.125 2024-08-09 20:41:42,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=186880.0, ans=0.0 2024-08-09 20:41:52,548 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4200, loss[loss=0.117, beats_loss=0.0121, ecapa_loss=0.0002787, whisper_loss=0.1021, over 18319.00 frames. ], tot_loss[loss=0.1191, beats_loss=0.01274, ecapa_loss=0.0003478, whisper_loss=0.1029, over 3921853.99 frames. ], batch size: 70, lr: 2.69e-02, grad_scale: 16384.0 2024-08-09 20:41:54,916 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.958e+01 3.347e+01 3.898e+01 6.800e+01, threshold=6.694e+01, percent-clipped=1.0 2024-08-09 20:41:55,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=186980.0, ans=0.0 2024-08-09 20:41:59,281 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-09 20:42:02,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=186980.0, ans=0.1 2024-08-09 20:42:07,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=187080.0, ans=15.0 2024-08-09 20:42:08,702 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.90 vs. limit=15.0 2024-08-09 20:42:10,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=187080.0, ans=0.1 2024-08-09 20:42:11,854 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-09 20:42:13,208 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-09 20:42:17,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=187180.0, ans=0.125 2024-08-09 20:42:49,676 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.13 vs. limit=6.0 2024-08-09 20:42:50,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=187380.0, ans=0.125 2024-08-09 20:42:52,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=187380.0, ans=0.125 2024-08-09 20:42:57,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=187480.0, ans=0.05 2024-08-09 20:42:58,070 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4250, loss[loss=0.1374, beats_loss=0.01067, ecapa_loss=0.0003351, whisper_loss=0.1234, over 16817.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01278, ecapa_loss=0.0003479, whisper_loss=0.1023, over 3905989.72 frames. ], batch size: 67, lr: 2.69e-02, grad_scale: 16384.0 2024-08-09 20:43:18,402 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.83 vs. limit=10.0 2024-08-09 20:43:52,849 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.824e-01 2024-08-09 20:43:58,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=187880.0, ans=0.0 2024-08-09 20:44:03,841 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4300, loss[loss=0.09798, beats_loss=0.01541, ecapa_loss=0.0003452, whisper_loss=0.07912, over 19755.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01283, ecapa_loss=0.0003482, whisper_loss=0.1016, over 3889147.66 frames. ], batch size: 83, lr: 2.69e-02, grad_scale: 16384.0 2024-08-09 20:44:04,022 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-09 20:44:06,743 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.261e+01 2.942e+01 3.508e+01 4.302e+01 6.032e+01, threshold=7.016e+01, percent-clipped=0.0 2024-08-09 20:44:14,629 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-09 20:44:15,192 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2024-08-09 20:44:24,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=188080.0, ans=0.125 2024-08-09 20:44:27,558 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 37 from Vox, 33 fro AS 2024-08-09 20:44:31,685 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-09 20:44:35,766 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-09 20:44:36,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=188180.0, ans=0.125 2024-08-09 20:44:41,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=188180.0, ans=0.2 2024-08-09 20:44:59,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=188380.0, ans=0.0 2024-08-09 20:45:00,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=188380.0, ans=6.0 2024-08-09 20:45:09,662 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4350, loss[loss=0.1362, beats_loss=0.009581, ecapa_loss=0.0003868, whisper_loss=0.1228, over 22166.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01276, ecapa_loss=0.0003491, whisper_loss=0.102, over 3893714.95 frames. ], batch size: 88, lr: 2.68e-02, grad_scale: 16384.0 2024-08-09 20:45:12,096 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=15.0 2024-08-09 20:45:13,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=188480.0, ans=0.0 2024-08-09 20:45:19,849 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2024-08-09 20:45:50,990 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-09 20:45:55,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=188780.0, ans=0.0 2024-08-09 20:45:55,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=188780.0, ans=0.125 2024-08-09 20:45:58,824 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 20:46:03,380 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-09 20:46:07,765 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 26 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-09 20:46:16,396 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-09 20:46:20,438 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4400, loss[loss=0.1347, beats_loss=0.01135, ecapa_loss=0.0003614, whisper_loss=0.1197, over 19768.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.01269, ecapa_loss=0.0003473, whisper_loss=0.1022, over 3899895.10 frames. ], batch size: 81, lr: 2.68e-02, grad_scale: 16384.0 2024-08-09 20:46:23,481 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.890e+01 3.311e+01 3.807e+01 6.108e+01, threshold=6.622e+01, percent-clipped=0.0 2024-08-09 20:46:48,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=189080.0, ans=0.125 2024-08-09 20:47:23,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=189380.0, ans=0.125 2024-08-09 20:47:38,358 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4450, loss[loss=0.1164, beats_loss=0.0125, ecapa_loss=0.0003149, whisper_loss=0.1007, over 20941.00 frames. ], tot_loss[loss=0.1189, beats_loss=0.0126, ecapa_loss=0.0003473, whisper_loss=0.1028, over 3887321.05 frames. ], batch size: 81, lr: 2.68e-02, grad_scale: 16384.0 2024-08-09 20:47:42,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=189480.0, ans=0.125 2024-08-09 20:47:45,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=189480.0, ans=0.5 2024-08-09 20:47:53,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=189580.0, ans=0.125 2024-08-09 20:48:01,912 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 20:48:28,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=189780.0, ans=0.0 2024-08-09 20:48:30,179 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 24 from LS+wenet, 11 from Vox, 19 fro AS 2024-08-09 20:48:41,613 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 20:48:45,409 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2024-08-09 20:49:02,952 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4500, loss[loss=0.09887, beats_loss=0.01797, ecapa_loss=0.0002145, whisper_loss=0.07875, over 16014.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01266, ecapa_loss=0.000346, whisper_loss=0.1022, over 3876049.15 frames. ], batch size: 61, lr: 2.67e-02, grad_scale: 16384.0 2024-08-09 20:49:05,835 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.93 vs. limit=10.0 2024-08-09 20:49:06,602 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.955e+01 3.431e+01 3.879e+01 5.998e+01, threshold=6.863e+01, percent-clipped=0.0 2024-08-09 20:49:06,826 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-09 20:49:30,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=190080.0, ans=0.1 2024-08-09 20:49:45,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=190180.0, ans=0.125 2024-08-09 20:49:47,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=190180.0, ans=0.0 2024-08-09 20:49:54,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=190280.0, ans=0.5 2024-08-09 20:50:14,552 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.00 vs. limit=6.0 2024-08-09 20:50:21,753 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 20:50:24,563 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4550, loss[loss=0.1132, beats_loss=0.01028, ecapa_loss=0.0004, whisper_loss=0.09887, over 20016.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01263, ecapa_loss=0.0003473, whisper_loss=0.1022, over 3849197.64 frames. ], batch size: 81, lr: 2.67e-02, grad_scale: 16384.0 2024-08-09 20:50:31,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=190480.0, ans=0.0 2024-08-09 20:50:42,054 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 20 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 20:51:11,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=190780.0, ans=0.125 2024-08-09 20:51:23,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=190780.0, ans=0.02 2024-08-09 20:51:45,653 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4600, loss[loss=0.1244, beats_loss=0.01137, ecapa_loss=0.0003679, whisper_loss=0.1093, over 22742.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01265, ecapa_loss=0.0003457, whisper_loss=0.1017, over 3857537.33 frames. ], batch size: 92, lr: 2.67e-02, grad_scale: 16384.0 2024-08-09 20:51:48,713 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 2.933e+01 3.481e+01 4.250e+01 8.633e+01, threshold=6.961e+01, percent-clipped=3.0 2024-08-09 20:51:53,321 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 17 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 20:51:56,078 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 20:52:23,504 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-09 20:52:53,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=191380.0, ans=0.1 2024-08-09 20:53:05,213 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4650, loss[loss=0.1127, beats_loss=0.01591, ecapa_loss=0.0003263, whisper_loss=0.09354, over 22602.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01271, ecapa_loss=0.0003448, whisper_loss=0.1013, over 3855779.75 frames. ], batch size: 93, lr: 2.66e-02, grad_scale: 16384.0 2024-08-09 20:53:18,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=191480.0, ans=0.125 2024-08-09 20:53:18,702 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.03 vs. limit=15.0 2024-08-09 20:53:26,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=191580.0, ans=0.2 2024-08-09 20:53:31,075 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 20:53:53,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=191780.0, ans=0.125 2024-08-09 20:53:55,111 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 39 from LS+wenet, 29 from Vox, 25 fro AS 2024-08-09 20:54:11,082 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-09 20:54:11,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=191880.0, ans=0.0 2024-08-09 20:54:17,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=191880.0, ans=0.125 2024-08-09 20:54:25,258 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4700, loss[loss=0.09976, beats_loss=0.01425, ecapa_loss=0.0003517, whisper_loss=0.08199, over 14295.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01282, ecapa_loss=0.0003399, whisper_loss=0.1019, over 3843406.76 frames. ], batch size: 59, lr: 2.66e-02, grad_scale: 16384.0 2024-08-09 20:54:28,071 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 2.995e+01 3.606e+01 4.056e+01 7.854e+01, threshold=7.212e+01, percent-clipped=1.0 2024-08-09 20:54:33,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=191980.0, ans=0.125 2024-08-09 20:54:33,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=191980.0, ans=0.1 2024-08-09 20:54:49,412 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2024-08-09 20:54:52,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=192080.0, ans=0.0 2024-08-09 20:54:56,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=192180.0, ans=0.125 2024-08-09 20:55:05,322 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.96 vs. limit=15.0 2024-08-09 20:55:19,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=192280.0, ans=0.1 2024-08-09 20:55:19,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-08-09 20:55:34,392 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 20:55:41,594 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 27 from LS+wenet, 36 from Vox, 32 fro AS 2024-08-09 20:55:45,802 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4750, loss[loss=0.111, beats_loss=0.01446, ecapa_loss=0.0003304, whisper_loss=0.09328, over 20996.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01294, ecapa_loss=0.00034, whisper_loss=0.1008, over 3852080.40 frames. ], batch size: 83, lr: 2.66e-02, grad_scale: 16384.0 2024-08-09 20:55:48,415 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-09 20:55:48,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=192480.0, ans=0.125 2024-08-09 20:56:19,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=192680.0, ans=0.1 2024-08-09 20:56:25,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=192680.0, ans=0.035 2024-08-09 20:56:41,648 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 20:56:43,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=192780.0, ans=0.09899494936611666 2024-08-09 20:56:46,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=192880.0, ans=0.1 2024-08-09 20:56:48,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=192880.0, ans=0.125 2024-08-09 20:56:50,425 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2024-08-09 20:57:00,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=192880.0, ans=0.2 2024-08-09 20:57:04,166 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4800, loss[loss=0.1225, beats_loss=0.01304, ecapa_loss=0.0003806, whisper_loss=0.1057, over 20414.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01292, ecapa_loss=0.0003412, whisper_loss=0.1018, over 3872702.16 frames. ], batch size: 83, lr: 2.66e-02, grad_scale: 16384.0 2024-08-09 20:57:07,360 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 3.258e+01 3.599e+01 4.060e+01 6.614e+01, threshold=7.198e+01, percent-clipped=0.0 2024-08-09 20:57:12,422 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 20:57:35,130 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 11 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 20:57:45,453 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 20:58:10,451 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.02 vs. limit=22.5 2024-08-09 20:58:11,572 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.10 vs. limit=10.0 2024-08-09 20:58:17,904 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4850, loss[loss=0.1049, beats_loss=0.01151, ecapa_loss=0.0004143, whisper_loss=0.0893, over 15102.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.0129, ecapa_loss=0.0003428, whisper_loss=0.1021, over 3891549.81 frames. ], batch size: 63, lr: 2.65e-02, grad_scale: 16384.0 2024-08-09 20:58:24,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=193480.0, ans=0.125 2024-08-09 20:58:33,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=193580.0, ans=0.07 2024-08-09 20:58:50,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=193680.0, ans=0.07 2024-08-09 20:58:50,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193680.0, ans=0.1 2024-08-09 20:58:55,067 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=4.94 vs. limit=15.0 2024-08-09 20:59:08,210 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-09 20:59:27,453 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4900, loss[loss=0.1151, beats_loss=0.01174, ecapa_loss=0.0003888, whisper_loss=0.09952, over 19227.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01285, ecapa_loss=0.0003436, whisper_loss=0.102, over 3856983.54 frames. ], batch size: 76, lr: 2.65e-02, grad_scale: 16384.0 2024-08-09 20:59:30,440 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.990e+01 3.252e+01 3.746e+01 5.696e+01, threshold=6.504e+01, percent-clipped=0.0 2024-08-09 20:59:40,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=194080.0, ans=0.1 2024-08-09 20:59:41,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=194080.0, ans=0.1 2024-08-09 20:59:46,058 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.743e-02 2024-08-09 21:00:00,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=194180.0, ans=0.0 2024-08-09 21:00:05,894 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 21:00:36,248 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 4950, loss[loss=0.1173, beats_loss=0.01594, ecapa_loss=0.0003376, whisper_loss=0.09797, over 17650.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01285, ecapa_loss=0.0003439, whisper_loss=0.1019, over 3840030.38 frames. ], batch size: 72, lr: 2.65e-02, grad_scale: 16384.0 2024-08-09 21:00:44,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=194480.0, ans=0.125 2024-08-09 21:00:51,183 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 21:00:55,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=194580.0, ans=0.0 2024-08-09 21:01:10,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=194680.0, ans=0.1 2024-08-09 21:01:32,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=194880.0, ans=0.2 2024-08-09 21:01:35,939 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-09 21:01:39,184 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 21:01:39,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=194880.0, ans=0.2 2024-08-09 21:01:43,968 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5000, loss[loss=0.09757, beats_loss=0.01279, ecapa_loss=0.000318, whisper_loss=0.08161, over 22407.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01278, ecapa_loss=0.0003445, whisper_loss=0.1024, over 3838918.08 frames. ], batch size: 88, lr: 2.64e-02, grad_scale: 16384.0 2024-08-09 21:01:46,814 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.185e+01 2.882e+01 3.259e+01 3.861e+01 5.497e+01, threshold=6.518e+01, percent-clipped=0.0 2024-08-09 21:01:50,638 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 24 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-09 21:01:54,529 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 21:02:19,720 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.45 vs. limit=22.5 2024-08-09 21:02:31,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=195280.0, ans=0.07 2024-08-09 21:02:51,135 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5050, loss[loss=0.1114, beats_loss=0.01375, ecapa_loss=0.0003665, whisper_loss=0.09396, over 17802.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01288, ecapa_loss=0.0003431, whisper_loss=0.102, over 3858686.78 frames. ], batch size: 72, lr: 2.64e-02, grad_scale: 16384.0 2024-08-09 21:03:09,760 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-09 21:03:19,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=195680.0, ans=0.0 2024-08-09 21:03:25,359 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-09 21:03:37,298 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-09 21:03:37,935 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.39 vs. limit=10.0 2024-08-09 21:03:42,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=195880.0, ans=0.125 2024-08-09 21:03:48,837 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.52 vs. limit=15.0 2024-08-09 21:03:54,814 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-09 21:03:55,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=195880.0, ans=0.125 2024-08-09 21:03:57,170 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5100, loss[loss=0.1179, beats_loss=0.01472, ecapa_loss=0.0004032, whisper_loss=0.09915, over 20885.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.0128, ecapa_loss=0.0003429, whisper_loss=0.1026, over 3882949.34 frames. ], batch size: 92, lr: 2.64e-02, grad_scale: 16384.0 2024-08-09 21:03:59,936 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 2.875e+01 3.306e+01 3.993e+01 6.485e+01, threshold=6.613e+01, percent-clipped=0.0 2024-08-09 21:04:01,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=195980.0, ans=0.0 2024-08-09 21:04:38,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=196280.0, ans=0.2 2024-08-09 21:04:54,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=196380.0, ans=0.2 2024-08-09 21:04:57,621 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-09 21:05:05,572 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5150, loss[loss=0.09475, beats_loss=0.01277, ecapa_loss=0.0003114, whisper_loss=0.07886, over 13597.00 frames. ], tot_loss[loss=0.1192, beats_loss=0.0128, ecapa_loss=0.0003377, whisper_loss=0.103, over 3861495.40 frames. ], batch size: 55, lr: 2.64e-02, grad_scale: 16384.0 2024-08-09 21:05:06,984 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 21:05:15,387 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-09 21:05:15,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=196480.0, ans=0.2 2024-08-09 21:05:19,212 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-09 21:05:28,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=196580.0, ans=0.0 2024-08-09 21:05:35,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=196680.0, ans=0.0 2024-08-09 21:05:36,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=196680.0, ans=0.125 2024-08-09 21:05:41,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=196680.0, ans=0.125 2024-08-09 21:05:42,026 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2024-08-09 21:05:49,471 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 21:05:56,044 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.44 vs. limit=15.0 2024-08-09 21:06:13,493 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5200, loss[loss=0.1171, beats_loss=0.008636, ecapa_loss=0.0003654, whisper_loss=0.1048, over 15828.00 frames. ], tot_loss[loss=0.1198, beats_loss=0.01275, ecapa_loss=0.000335, whisper_loss=0.1037, over 3873834.88 frames. ], batch size: 59, lr: 2.63e-02, grad_scale: 16384.0 2024-08-09 21:06:14,500 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.67 vs. limit=15.0 2024-08-09 21:06:16,142 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.014e+01 2.861e+01 3.315e+01 3.921e+01 5.764e+01, threshold=6.630e+01, percent-clipped=0.0 2024-08-09 21:06:34,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=197080.0, ans=0.125 2024-08-09 21:06:45,286 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-09 21:06:53,201 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-09 21:06:59,799 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.10 vs. limit=15.0 2024-08-09 21:07:09,497 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 8 from Vox, 30 fro AS 2024-08-09 21:07:11,353 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.15 vs. limit=22.5 2024-08-09 21:07:20,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=197480.0, ans=0.0 2024-08-09 21:07:21,285 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5250, loss[loss=0.1007, beats_loss=0.0141, ecapa_loss=0.0003817, whisper_loss=0.08283, over 20172.00 frames. ], tot_loss[loss=0.1189, beats_loss=0.01278, ecapa_loss=0.0003338, whisper_loss=0.1028, over 3854865.76 frames. ], batch size: 87, lr: 2.63e-02, grad_scale: 16384.0 2024-08-09 21:07:29,620 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 31 from LS+wenet, 6 from Vox, 37 fro AS 2024-08-09 21:07:54,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=197680.0, ans=0.1 2024-08-09 21:08:07,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=197780.0, ans=0.0 2024-08-09 21:08:08,625 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-09 21:08:13,199 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 12 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 21:08:13,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=197780.0, ans=0.1 2024-08-09 21:08:19,145 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 21:08:30,368 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5300, loss[loss=0.09147, beats_loss=0.01243, ecapa_loss=0.0004055, whisper_loss=0.07499, over 14975.00 frames. ], tot_loss[loss=0.1191, beats_loss=0.01272, ecapa_loss=0.0003368, whisper_loss=0.103, over 3840281.27 frames. ], batch size: 63, lr: 2.63e-02, grad_scale: 16384.0 2024-08-09 21:08:32,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=197980.0, ans=0.1 2024-08-09 21:08:33,243 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.223e+01 2.918e+01 3.459e+01 4.148e+01 6.900e+01, threshold=6.919e+01, percent-clipped=2.0 2024-08-09 21:08:34,105 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.87 vs. limit=12.0 2024-08-09 21:08:36,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.01 vs. limit=15.0 2024-08-09 21:08:42,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=197980.0, ans=0.125 2024-08-09 21:08:43,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=198080.0, ans=0.125 2024-08-09 21:08:43,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=198080.0, ans=0.1 2024-08-09 21:08:46,521 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.90 vs. limit=15.0 2024-08-09 21:08:48,617 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-09 21:08:57,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=198180.0, ans=0.2 2024-08-09 21:09:22,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=198280.0, ans=0.125 2024-08-09 21:09:40,425 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5350, loss[loss=0.1191, beats_loss=0.01401, ecapa_loss=0.0002882, whisper_loss=0.1022, over 22348.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01269, ecapa_loss=0.0003362, whisper_loss=0.1027, over 3839008.01 frames. ], batch size: 88, lr: 2.62e-02, grad_scale: 16384.0 2024-08-09 21:10:02,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=198580.0, ans=0.0 2024-08-09 21:10:02,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=198580.0, ans=0.125 2024-08-09 21:10:05,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=198580.0, ans=0.125 2024-08-09 21:10:27,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=198780.0, ans=0.035 2024-08-09 21:10:40,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=198880.0, ans=0.125 2024-08-09 21:10:46,194 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 21 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-09 21:10:47,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=198880.0, ans=0.2 2024-08-09 21:10:50,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=198880.0, ans=0.1 2024-08-09 21:10:52,691 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5400, loss[loss=0.1113, beats_loss=0.01197, ecapa_loss=0.000326, whisper_loss=0.09607, over 22297.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01261, ecapa_loss=0.0003375, whisper_loss=0.1026, over 3823010.65 frames. ], batch size: 89, lr: 2.62e-02, grad_scale: 16384.0 2024-08-09 21:10:55,696 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 2.905e+01 3.438e+01 3.898e+01 7.093e+01, threshold=6.876e+01, percent-clipped=1.0 2024-08-09 21:11:01,412 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-09 21:11:02,314 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.40 vs. limit=15.0 2024-08-09 21:11:08,339 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 21:11:10,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=199080.0, ans=0.125 2024-08-09 21:11:15,626 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-09 21:11:18,704 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 21:12:02,697 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-09 21:12:06,770 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5450, loss[loss=0.1238, beats_loss=0.01193, ecapa_loss=0.000337, whisper_loss=0.1085, over 23533.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01268, ecapa_loss=0.0003373, whisper_loss=0.1024, over 3876359.07 frames. ], batch size: 93, lr: 2.62e-02, grad_scale: 16384.0 2024-08-09 21:12:09,746 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 31 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 21:12:18,531 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-09 21:12:19,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=199580.0, ans=0.1 2024-08-09 21:13:11,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=199880.0, ans=0.125 2024-08-09 21:13:16,658 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 21:13:17,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=199980.0, ans=0.125 2024-08-09 21:13:18,001 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5500, loss[loss=0.1157, beats_loss=0.01305, ecapa_loss=0.0003427, whisper_loss=0.09925, over 21869.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01269, ecapa_loss=0.0003391, whisper_loss=0.1025, over 3871450.43 frames. ], batch size: 88, lr: 2.61e-02, grad_scale: 16384.0 2024-08-09 21:13:23,467 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.340e+01 3.012e+01 3.355e+01 3.811e+01 5.286e+01, threshold=6.711e+01, percent-clipped=0.0 2024-08-09 21:13:28,080 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-09 21:13:40,874 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 21:14:12,788 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-09 21:14:21,705 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-09 21:14:22,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=200380.0, ans=0.1 2024-08-09 21:14:33,105 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5550, loss[loss=0.1357, beats_loss=0.009996, ecapa_loss=0.0003719, whisper_loss=0.122, over 14872.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01267, ecapa_loss=0.0003416, whisper_loss=0.102, over 3876577.37 frames. ], batch size: 56, lr: 2.61e-02, grad_scale: 32768.0 2024-08-09 21:14:33,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=200480.0, ans=0.125 2024-08-09 21:14:36,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=200480.0, ans=0.0 2024-08-09 21:14:43,574 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-09 21:14:50,002 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.47 vs. limit=22.5 2024-08-09 21:14:52,024 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 21:15:11,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=200680.0, ans=0.125 2024-08-09 21:15:26,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=200780.0, ans=0.125 2024-08-09 21:15:28,742 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 15 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 21:15:46,410 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5600, loss[loss=0.1289, beats_loss=0.01251, ecapa_loss=0.0003128, whisper_loss=0.1133, over 20937.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01269, ecapa_loss=0.0003395, whisper_loss=0.1022, over 3870481.54 frames. ], batch size: 82, lr: 2.61e-02, grad_scale: 32768.0 2024-08-09 21:15:48,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=200980.0, ans=0.1 2024-08-09 21:15:49,812 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.389e+01 3.019e+01 3.603e+01 4.139e+01 2.249e+02, threshold=7.206e+01, percent-clipped=7.0 2024-08-09 21:15:51,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=200980.0, ans=0.0 2024-08-09 21:15:53,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=200980.0, ans=0.1 2024-08-09 21:15:56,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=200980.0, ans=0.1 2024-08-09 21:16:02,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=201080.0, ans=0.1 2024-08-09 21:16:12,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=201080.0, ans=0.1 2024-08-09 21:16:20,078 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-09 21:16:28,183 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 21:16:29,449 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 31 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-09 21:16:29,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=201280.0, ans=0.0 2024-08-09 21:16:39,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=201280.0, ans=0.125 2024-08-09 21:16:43,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=201380.0, ans=0.0 2024-08-09 21:16:44,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=201380.0, ans=0.125 2024-08-09 21:16:56,076 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5650, loss[loss=0.1235, beats_loss=0.01042, ecapa_loss=0.0003377, whisper_loss=0.1097, over 18169.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01272, ecapa_loss=0.0003392, whisper_loss=0.1018, over 3899460.99 frames. ], batch size: 71, lr: 2.61e-02, grad_scale: 32768.0 2024-08-09 21:17:00,006 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2024-08-09 21:17:05,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=201480.0, ans=0.0 2024-08-09 21:17:05,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=201480.0, ans=0.0 2024-08-09 21:17:20,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=201580.0, ans=0.125 2024-08-09 21:17:25,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=201680.0, ans=0.125 2024-08-09 21:17:37,798 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-09 21:17:41,516 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-09 21:17:44,087 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 21:17:48,765 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.36 vs. limit=15.0 2024-08-09 21:17:52,677 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.78 vs. limit=12.0 2024-08-09 21:17:59,764 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 21:18:03,381 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5700, loss[loss=0.1039, beats_loss=0.01517, ecapa_loss=0.0002992, whisper_loss=0.08573, over 17572.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01265, ecapa_loss=0.0003395, whisper_loss=0.1022, over 3891175.35 frames. ], batch size: 68, lr: 2.60e-02, grad_scale: 32768.0 2024-08-09 21:18:05,173 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 21:18:06,767 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+01 3.095e+01 3.448e+01 4.225e+01 7.062e+01, threshold=6.897e+01, percent-clipped=0.0 2024-08-09 21:18:11,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=201980.0, ans=0.125 2024-08-09 21:18:17,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.28 vs. limit=22.5 2024-08-09 21:18:21,161 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 21:18:43,472 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-09 21:18:55,751 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 21:19:03,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=202380.0, ans=0.0 2024-08-09 21:19:10,518 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5750, loss[loss=0.1368, beats_loss=0.01251, ecapa_loss=0.0002625, whisper_loss=0.1216, over 23579.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01278, ecapa_loss=0.0003397, whisper_loss=0.1014, over 3906175.48 frames. ], batch size: 89, lr: 2.60e-02, grad_scale: 32768.0 2024-08-09 21:19:20,830 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.62 vs. limit=22.5 2024-08-09 21:19:23,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=202580.0, ans=0.5 2024-08-09 21:19:30,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=202580.0, ans=0.0 2024-08-09 21:19:44,746 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 33 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 21:19:49,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=202680.0, ans=0.2 2024-08-09 21:20:01,797 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-09 21:20:08,896 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 21:20:17,931 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5800, loss[loss=0.1249, beats_loss=0.01047, ecapa_loss=0.0003338, whisper_loss=0.1111, over 21640.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01275, ecapa_loss=0.0003397, whisper_loss=0.1018, over 3878736.11 frames. ], batch size: 83, lr: 2.60e-02, grad_scale: 32768.0 2024-08-09 21:20:19,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=202980.0, ans=0.1 2024-08-09 21:20:19,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=202980.0, ans=0.2 2024-08-09 21:20:20,440 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 3.100e+01 3.407e+01 4.370e+01 6.410e+01, threshold=6.814e+01, percent-clipped=0.0 2024-08-09 21:20:23,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=202980.0, ans=0.0 2024-08-09 21:20:24,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=202980.0, ans=0.0 2024-08-09 21:20:27,381 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 21:20:36,791 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 21:20:37,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=203080.0, ans=0.1 2024-08-09 21:20:46,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=203180.0, ans=0.0 2024-08-09 21:20:46,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=203180.0, ans=0.035 2024-08-09 21:20:49,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=203180.0, ans=0.125 2024-08-09 21:20:56,647 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 21:21:00,689 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-09 21:21:03,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=203280.0, ans=0.1 2024-08-09 21:21:19,948 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.43 vs. limit=15.0 2024-08-09 21:21:24,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=203480.0, ans=0.025 2024-08-09 21:21:24,874 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5850, loss[loss=0.1251, beats_loss=0.01094, ecapa_loss=0.0003801, whisper_loss=0.1103, over 21900.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01272, ecapa_loss=0.0003403, whisper_loss=0.1011, over 3852377.64 frames. ], batch size: 89, lr: 2.59e-02, grad_scale: 32768.0 2024-08-09 21:21:35,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=203480.0, ans=0.125 2024-08-09 21:21:38,684 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.93 vs. limit=12.0 2024-08-09 21:21:49,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=203580.0, ans=0.125 2024-08-09 21:21:49,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=203580.0, ans=0.1 2024-08-09 21:21:52,867 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 21:21:53,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=203680.0, ans=0.0 2024-08-09 21:22:01,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=203680.0, ans=0.0 2024-08-09 21:22:03,531 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-09 21:22:07,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=203780.0, ans=0.125 2024-08-09 21:22:13,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=203780.0, ans=0.0 2024-08-09 21:22:13,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=203780.0, ans=0.125 2024-08-09 21:22:13,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=203780.0, ans=0.0 2024-08-09 21:22:31,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=203980.0, ans=15.0 2024-08-09 21:22:31,549 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5900, loss[loss=0.1051, beats_loss=0.01321, ecapa_loss=0.0003551, whisper_loss=0.08835, over 22313.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01282, ecapa_loss=0.000338, whisper_loss=0.1001, over 3864302.30 frames. ], batch size: 93, lr: 2.59e-02, grad_scale: 32768.0 2024-08-09 21:22:32,895 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-09 21:22:34,084 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+01 3.068e+01 3.370e+01 4.019e+01 7.434e+01, threshold=6.739e+01, percent-clipped=1.0 2024-08-09 21:22:39,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=203980.0, ans=0.0 2024-08-09 21:22:43,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=204080.0, ans=0.125 2024-08-09 21:22:48,778 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-08-09 21:22:52,246 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 9 from Vox, 38 fro AS 2024-08-09 21:23:01,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=204180.0, ans=0.0 2024-08-09 21:23:03,687 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 21:23:08,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=204180.0, ans=0.2 2024-08-09 21:23:25,911 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.133e-01 2024-08-09 21:23:38,822 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.94 vs. limit=10.0 2024-08-09 21:23:39,257 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 5950, loss[loss=0.1163, beats_loss=0.0136, ecapa_loss=0.0003401, whisper_loss=0.09926, over 18460.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01281, ecapa_loss=0.0003369, whisper_loss=0.09974, over 3835507.31 frames. ], batch size: 73, lr: 2.59e-02, grad_scale: 32768.0 2024-08-09 21:23:53,874 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 23 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-09 21:23:55,145 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-09 21:24:44,482 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6000, loss[loss=0.1181, beats_loss=0.01206, ecapa_loss=0.0003254, whisper_loss=0.1027, over 22722.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01281, ecapa_loss=0.0003355, whisper_loss=0.1, over 3870801.27 frames. ], batch size: 89, lr: 2.59e-02, grad_scale: 32768.0 2024-08-09 21:24:44,483 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-09 21:25:26,024 INFO [train_multi_KD3.py:1149] (2/4) Epoch 2, validation on ASR_libri: loss=0.2831, beats_loss=0, ecapa_loss=0.0009654, whisper_loss=0.2734, over 922467.00 frames. 2024-08-09 21:25:44,701 INFO [train_multi_KD3.py:1149] (2/4) Epoch 2, validation on SV_voxceleb1: loss=0.008561, beats_loss=0, ecapa_loss=0.0008561, whisper_loss=0, over 939242.00 frames. 2024-08-09 21:27:41,246 INFO [train_multi_KD3.py:1149] (2/4) Epoch 2, validation on AT_audioset: loss=0.03036, beats_loss=0.03036, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 21:27:41,251 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-09 21:27:42,167 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.73 vs. limit=22.5 2024-08-09 21:27:43,843 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.831e+01 3.333e+01 3.565e+01 5.881e+01, threshold=6.666e+01, percent-clipped=0.0 2024-08-09 21:27:44,111 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 40 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-09 21:27:49,393 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-09 21:27:51,111 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.37 vs. limit=10.0 2024-08-09 21:27:56,490 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=15.0 2024-08-09 21:28:15,058 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 21:28:30,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=205280.0, ans=0.0 2024-08-09 21:28:34,603 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.33 vs. limit=15.0 2024-08-09 21:28:35,309 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 21:28:45,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=205380.0, ans=0.125 2024-08-09 21:28:48,657 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6050, loss[loss=0.09252, beats_loss=0.01288, ecapa_loss=0.0003126, whisper_loss=0.07651, over 17553.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01281, ecapa_loss=0.000335, whisper_loss=0.09995, over 3878763.19 frames. ], batch size: 70, lr: 2.58e-02, grad_scale: 32768.0 2024-08-09 21:28:51,381 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-09 21:28:53,023 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.131e+03 2024-08-09 21:29:03,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=205580.0, ans=0.0 2024-08-09 21:29:07,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=205580.0, ans=0.125 2024-08-09 21:29:15,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=205680.0, ans=0.125 2024-08-09 21:29:23,489 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2024-08-09 21:29:32,444 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-09 21:29:41,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=205880.0, ans=0.125 2024-08-09 21:29:45,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=205880.0, ans=0.2 2024-08-09 21:29:54,935 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6100, loss[loss=0.1275, beats_loss=0.01264, ecapa_loss=0.0003395, whisper_loss=0.1115, over 20265.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01281, ecapa_loss=0.0003335, whisper_loss=0.1001, over 3883888.80 frames. ], batch size: 82, lr: 2.58e-02, grad_scale: 32768.0 2024-08-09 21:29:57,835 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 3.058e+01 3.470e+01 4.090e+01 8.250e+01, threshold=6.939e+01, percent-clipped=1.0 2024-08-09 21:30:03,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=205980.0, ans=0.0 2024-08-09 21:30:16,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=206080.0, ans=0.1 2024-08-09 21:30:31,563 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-09 21:30:38,565 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.77 vs. limit=12.0 2024-08-09 21:30:45,398 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.94 vs. limit=15.0 2024-08-09 21:31:00,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=206380.0, ans=0.125 2024-08-09 21:31:03,538 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6150, loss[loss=0.09306, beats_loss=0.01565, ecapa_loss=0.0002674, whisper_loss=0.07474, over 18187.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01283, ecapa_loss=0.0003327, whisper_loss=0.1004, over 3901405.70 frames. ], batch size: 71, lr: 2.58e-02, grad_scale: 32768.0 2024-08-09 21:31:07,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=206480.0, ans=0.125 2024-08-09 21:31:16,984 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.01 vs. limit=22.5 2024-08-09 21:31:22,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=206580.0, ans=0.0 2024-08-09 21:31:53,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=206780.0, ans=0.2 2024-08-09 21:32:01,116 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-09 21:32:05,113 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-09 21:32:10,585 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6200, loss[loss=0.1148, beats_loss=0.01127, ecapa_loss=0.0003458, whisper_loss=0.1001, over 23017.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01276, ecapa_loss=0.0003358, whisper_loss=0.1009, over 3891820.35 frames. ], batch size: 91, lr: 2.58e-02, grad_scale: 32768.0 2024-08-09 21:32:13,172 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 3.042e+01 3.611e+01 4.258e+01 6.640e+01, threshold=7.222e+01, percent-clipped=0.0 2024-08-09 21:32:16,094 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 37 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 21:32:19,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=206980.0, ans=0.2 2024-08-09 21:32:24,592 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-09 21:32:27,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=207080.0, ans=0.125 2024-08-09 21:32:29,788 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-09 21:32:34,973 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-09 21:33:01,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=207280.0, ans=0.0 2024-08-09 21:33:04,059 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.58 vs. limit=6.0 2024-08-09 21:33:09,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=207380.0, ans=0.2 2024-08-09 21:33:18,284 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6250, loss[loss=0.1531, beats_loss=0.008945, ecapa_loss=0.0002745, whisper_loss=0.1414, over 20362.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01276, ecapa_loss=0.0003343, whisper_loss=0.1011, over 3908061.70 frames. ], batch size: 71, lr: 2.57e-02, grad_scale: 32768.0 2024-08-09 21:33:23,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=207480.0, ans=0.125 2024-08-09 21:33:23,402 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=6.581e-01 2024-08-09 21:34:18,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=207880.0, ans=0.0 2024-08-09 21:34:27,746 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6300, loss[loss=0.1134, beats_loss=0.01307, ecapa_loss=0.0003259, whisper_loss=0.09711, over 15091.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01265, ecapa_loss=0.0003342, whisper_loss=0.1014, over 3900818.75 frames. ], batch size: 59, lr: 2.57e-02, grad_scale: 32768.0 2024-08-09 21:34:30,455 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.893e+01 3.305e+01 3.810e+01 5.470e+01, threshold=6.610e+01, percent-clipped=0.0 2024-08-09 21:34:42,807 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-09 21:34:46,930 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-09 21:34:58,198 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.59 vs. limit=6.0 2024-08-09 21:35:04,202 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-09 21:35:35,765 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6350, loss[loss=0.1511, beats_loss=0.0107, ecapa_loss=0.0003758, whisper_loss=0.1367, over 17399.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01272, ecapa_loss=0.000334, whisper_loss=0.1011, over 3872090.67 frames. ], batch size: 66, lr: 2.57e-02, grad_scale: 32768.0 2024-08-09 21:35:35,965 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 21:35:37,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=208480.0, ans=0.0 2024-08-09 21:35:40,307 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=15.0 2024-08-09 21:35:52,113 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 21:35:53,755 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 11 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-09 21:35:56,601 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-09 21:35:59,299 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 21:36:02,307 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-09 21:36:11,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=208680.0, ans=0.0 2024-08-09 21:36:37,517 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 21:36:44,971 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6400, loss[loss=0.1283, beats_loss=0.01457, ecapa_loss=0.0003125, whisper_loss=0.1106, over 23595.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01272, ecapa_loss=0.0003335, whisper_loss=0.102, over 3882729.01 frames. ], batch size: 94, lr: 2.56e-02, grad_scale: 32768.0 2024-08-09 21:36:48,109 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+01 3.030e+01 3.423e+01 4.041e+01 6.749e+01, threshold=6.846e+01, percent-clipped=1.0 2024-08-09 21:37:05,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=209080.0, ans=0.125 2024-08-09 21:37:18,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=209180.0, ans=0.0 2024-08-09 21:37:29,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=209280.0, ans=0.125 2024-08-09 21:37:34,558 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 21:37:36,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=209280.0, ans=0.0 2024-08-09 21:37:44,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=209380.0, ans=0.125 2024-08-09 21:37:50,124 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 21:37:54,829 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6450, loss[loss=0.1241, beats_loss=0.01134, ecapa_loss=0.0003456, whisper_loss=0.1093, over 23642.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01273, ecapa_loss=0.000335, whisper_loss=0.1019, over 3916857.39 frames. ], batch size: 94, lr: 2.56e-02, grad_scale: 32768.0 2024-08-09 21:37:55,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=209480.0, ans=15.0 2024-08-09 21:38:09,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=209580.0, ans=22.5 2024-08-09 21:38:13,077 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 29 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-09 21:38:21,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=209680.0, ans=0.1 2024-08-09 21:38:26,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=209680.0, ans=0.125 2024-08-09 21:38:28,272 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-09 21:38:31,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=209680.0, ans=0.09899494936611666 2024-08-09 21:38:32,752 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.112e-03 2024-08-09 21:38:39,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=209780.0, ans=0.1 2024-08-09 21:38:59,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=209880.0, ans=0.0 2024-08-09 21:39:04,633 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6500, loss[loss=0.1098, beats_loss=0.0137, ecapa_loss=0.000335, whisper_loss=0.09271, over 21254.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01256, ecapa_loss=0.0003352, whisper_loss=0.1031, over 3894001.50 frames. ], batch size: 88, lr: 2.56e-02, grad_scale: 32768.0 2024-08-09 21:39:05,993 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 21:39:07,348 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+01 2.878e+01 3.238e+01 3.656e+01 8.439e+01, threshold=6.476e+01, percent-clipped=1.0 2024-08-09 21:39:32,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=210180.0, ans=0.125 2024-08-09 21:39:48,697 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-09 21:40:03,240 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 32 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-09 21:40:14,259 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6550, loss[loss=0.1178, beats_loss=0.01243, ecapa_loss=0.0003992, whisper_loss=0.1013, over 20867.00 frames. ], tot_loss[loss=0.1191, beats_loss=0.01256, ecapa_loss=0.0003338, whisper_loss=0.1032, over 3894200.58 frames. ], batch size: 90, lr: 2.56e-02, grad_scale: 32768.0 2024-08-09 21:40:33,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=210580.0, ans=0.05 2024-08-09 21:40:40,768 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-09 21:41:01,116 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 38 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-09 21:41:05,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=210780.0, ans=0.125 2024-08-09 21:41:11,077 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.02 vs. limit=15.0 2024-08-09 21:41:22,212 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6600, loss[loss=0.109, beats_loss=0.0125, ecapa_loss=0.0003374, whisper_loss=0.09317, over 22278.00 frames. ], tot_loss[loss=0.1192, beats_loss=0.01257, ecapa_loss=0.0003348, whisper_loss=0.1033, over 3924000.70 frames. ], batch size: 90, lr: 2.55e-02, grad_scale: 32768.0 2024-08-09 21:41:23,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=210980.0, ans=0.1 2024-08-09 21:41:24,839 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 3.037e+01 3.483e+01 4.077e+01 6.253e+01, threshold=6.966e+01, percent-clipped=0.0 2024-08-09 21:41:33,351 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-09 21:41:40,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=211080.0, ans=0.1 2024-08-09 21:42:07,429 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 21:42:13,582 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.161e+00 2024-08-09 21:42:22,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=211380.0, ans=0.125 2024-08-09 21:42:23,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=211380.0, ans=0.125 2024-08-09 21:42:26,693 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.065e+03 2024-08-09 21:42:31,831 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6650, loss[loss=0.09559, beats_loss=0.0121, ecapa_loss=0.0003427, whisper_loss=0.08007, over 16985.00 frames. ], tot_loss[loss=0.1193, beats_loss=0.01257, ecapa_loss=0.0003361, whisper_loss=0.1034, over 3923306.88 frames. ], batch size: 68, lr: 2.55e-02, grad_scale: 32768.0 2024-08-09 21:42:33,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=211480.0, ans=0.2 2024-08-09 21:42:40,160 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-09 21:42:47,728 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-09 21:42:49,658 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2024-08-09 21:42:55,592 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-09 21:42:56,403 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.45 vs. limit=22.5 2024-08-09 21:42:56,829 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-09 21:43:03,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=211680.0, ans=0.1 2024-08-09 21:43:06,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=211680.0, ans=0.125 2024-08-09 21:43:13,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=211780.0, ans=0.2 2024-08-09 21:43:33,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=211880.0, ans=0.1 2024-08-09 21:43:33,798 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.45 vs. limit=6.0 2024-08-09 21:43:38,249 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6700, loss[loss=0.1393, beats_loss=0.01028, ecapa_loss=0.000327, whisper_loss=0.1258, over 23357.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01259, ecapa_loss=0.0003372, whisper_loss=0.1029, over 3913812.35 frames. ], batch size: 88, lr: 2.55e-02, grad_scale: 32768.0 2024-08-09 21:43:41,043 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.343e+01 3.049e+01 3.429e+01 4.303e+01 7.619e+01, threshold=6.858e+01, percent-clipped=1.0 2024-08-09 21:43:52,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=212080.0, ans=0.125 2024-08-09 21:43:54,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=212080.0, ans=15.0 2024-08-09 21:43:58,296 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 32 from Vox, 29 fro AS 2024-08-09 21:44:13,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=212180.0, ans=0.2 2024-08-09 21:44:47,793 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6750, loss[loss=0.1239, beats_loss=0.01088, ecapa_loss=0.0003714, whisper_loss=0.1093, over 20412.00 frames. ], tot_loss[loss=0.1189, beats_loss=0.01268, ecapa_loss=0.0003345, whisper_loss=0.1028, over 3933754.47 frames. ], batch size: 85, lr: 2.55e-02, grad_scale: 32768.0 2024-08-09 21:45:06,099 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.55 vs. limit=15.0 2024-08-09 21:45:08,359 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 13 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-09 21:45:16,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=212680.0, ans=0.125 2024-08-09 21:45:20,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=212680.0, ans=0.2 2024-08-09 21:45:23,914 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 21:45:25,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=212680.0, ans=0.125 2024-08-09 21:45:28,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=212780.0, ans=0.0 2024-08-09 21:45:34,313 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-09 21:45:37,674 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.71 vs. limit=22.5 2024-08-09 21:45:54,414 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.41 vs. limit=15.0 2024-08-09 21:45:56,209 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6800, loss[loss=0.1471, beats_loss=0.01262, ecapa_loss=0.0003495, whisper_loss=0.131, over 22256.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01268, ecapa_loss=0.0003342, whisper_loss=0.1027, over 3897446.09 frames. ], batch size: 89, lr: 2.54e-02, grad_scale: 32768.0 2024-08-09 21:45:57,739 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-09 21:45:58,798 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.928e+01 3.409e+01 4.100e+01 8.566e+01, threshold=6.819e+01, percent-clipped=2.0 2024-08-09 21:46:00,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=212980.0, ans=0.125 2024-08-09 21:46:01,011 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.64 vs. limit=15.0 2024-08-09 21:46:02,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=212980.0, ans=0.125 2024-08-09 21:46:10,542 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=1.93 vs. limit=15.0 2024-08-09 21:46:16,273 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-09 21:46:19,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=213080.0, ans=0.0 2024-08-09 21:46:28,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=213180.0, ans=0.0 2024-08-09 21:46:40,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=213280.0, ans=0.1 2024-08-09 21:46:42,586 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2024-08-09 21:47:00,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=213380.0, ans=0.125 2024-08-09 21:47:03,699 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6850, loss[loss=0.1088, beats_loss=0.01574, ecapa_loss=0.0002414, whisper_loss=0.09068, over 18057.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01262, ecapa_loss=0.0003342, whisper_loss=0.1028, over 3878907.93 frames. ], batch size: 70, lr: 2.54e-02, grad_scale: 32768.0 2024-08-09 21:47:06,963 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.155e-02 2024-08-09 21:47:17,186 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-09 21:47:31,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=213680.0, ans=0.125 2024-08-09 21:47:34,231 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.66 vs. limit=10.0 2024-08-09 21:47:42,006 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.69 vs. limit=10.0 2024-08-09 21:47:43,435 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=16.04 vs. limit=15.0 2024-08-09 21:48:10,959 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6900, loss[loss=0.1066, beats_loss=0.01395, ecapa_loss=0.0004163, whisper_loss=0.08845, over 16310.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01267, ecapa_loss=0.0003343, whisper_loss=0.1025, over 3863568.92 frames. ], batch size: 70, lr: 2.54e-02, grad_scale: 32768.0 2024-08-09 21:48:13,990 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 3.002e+01 3.455e+01 4.166e+01 7.035e+01, threshold=6.909e+01, percent-clipped=1.0 2024-08-09 21:48:18,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=213980.0, ans=0.125 2024-08-09 21:48:26,654 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-09 21:48:33,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=214080.0, ans=0.0 2024-08-09 21:48:38,534 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 19 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-09 21:48:39,782 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-09 21:48:57,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=214280.0, ans=0.125 2024-08-09 21:48:58,370 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-09 21:49:01,332 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.742e+00 2024-08-09 21:49:05,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=214380.0, ans=0.125 2024-08-09 21:49:17,845 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 6950, loss[loss=0.1204, beats_loss=0.01279, ecapa_loss=0.0002964, whisper_loss=0.1047, over 19772.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.0126, ecapa_loss=0.0003322, whisper_loss=0.1028, over 3859225.73 frames. ], batch size: 78, lr: 2.54e-02, grad_scale: 32768.0 2024-08-09 21:49:19,408 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-09 21:49:19,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=214480.0, ans=0.125 2024-08-09 21:49:20,335 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.18 vs. limit=10.0 2024-08-09 21:49:23,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=214480.0, ans=0.0 2024-08-09 21:49:29,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=214480.0, ans=0.0 2024-08-09 21:49:30,941 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.05 vs. limit=22.5 2024-08-09 21:49:31,655 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 26 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-09 21:49:44,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=214680.0, ans=0.1 2024-08-09 21:49:52,262 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 9 from Vox, 32 fro AS 2024-08-09 21:49:57,302 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 10 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-09 21:50:04,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=214780.0, ans=0.125 2024-08-09 21:50:24,363 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7000, loss[loss=0.1179, beats_loss=0.01033, ecapa_loss=0.0002809, whisper_loss=0.1048, over 18330.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01255, ecapa_loss=0.0003317, whisper_loss=0.1031, over 3872171.72 frames. ], batch size: 68, lr: 2.53e-02, grad_scale: 32768.0 2024-08-09 21:50:26,105 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-09 21:50:26,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=214980.0, ans=0.07 2024-08-09 21:50:27,174 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.842e+01 3.336e+01 4.058e+01 9.243e+01, threshold=6.672e+01, percent-clipped=2.0 2024-08-09 21:50:30,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=214980.0, ans=0.0 2024-08-09 21:50:51,023 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-09 21:50:55,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=215180.0, ans=0.125 2024-08-09 21:51:03,830 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2024-08-09 21:51:19,185 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-09 21:51:20,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=215380.0, ans=0.1 2024-08-09 21:51:25,919 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 19 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 21:51:33,439 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7050, loss[loss=0.158, beats_loss=0.009857, ecapa_loss=0.0003535, whisper_loss=0.1446, over 18793.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.01264, ecapa_loss=0.0003302, whisper_loss=0.1025, over 3872151.70 frames. ], batch size: 73, lr: 2.53e-02, grad_scale: 32768.0 2024-08-09 21:51:33,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=215480.0, ans=0.5 2024-08-09 21:51:53,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=215580.0, ans=0.0 2024-08-09 21:52:03,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=215680.0, ans=0.1 2024-08-09 21:52:10,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=215680.0, ans=0.015 2024-08-09 21:52:25,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=215780.0, ans=0.125 2024-08-09 21:52:37,855 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 21:52:41,314 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7100, loss[loss=0.1178, beats_loss=0.01316, ecapa_loss=0.0002739, whisper_loss=0.1019, over 18271.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01269, ecapa_loss=0.0003306, whisper_loss=0.1008, over 3824278.85 frames. ], batch size: 68, lr: 2.53e-02, grad_scale: 32768.0 2024-08-09 21:52:43,994 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.849e+01 3.267e+01 3.796e+01 6.737e+01, threshold=6.534e+01, percent-clipped=1.0 2024-08-09 21:52:44,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=215980.0, ans=0.1 2024-08-09 21:52:50,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=215980.0, ans=0.125 2024-08-09 21:52:54,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=216080.0, ans=0.1 2024-08-09 21:52:54,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=216080.0, ans=0.125 2024-08-09 21:53:08,910 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-09 21:53:11,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=216180.0, ans=0.0 2024-08-09 21:53:18,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=216180.0, ans=0.1 2024-08-09 21:53:31,316 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 33 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-09 21:53:46,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=216380.0, ans=0.2 2024-08-09 21:53:46,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=216380.0, ans=0.1 2024-08-09 21:53:48,262 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7150, loss[loss=0.1342, beats_loss=0.0111, ecapa_loss=0.0002964, whisper_loss=0.1201, over 24252.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.0126, ecapa_loss=0.0003305, whisper_loss=0.1015, over 3877835.89 frames. ], batch size: 91, lr: 2.52e-02, grad_scale: 32768.0 2024-08-09 21:53:54,200 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 21:53:56,897 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-09 21:54:10,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=216580.0, ans=0.04949747468305833 2024-08-09 21:54:14,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=216680.0, ans=0.05 2024-08-09 21:54:26,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=216680.0, ans=0.125 2024-08-09 21:54:28,843 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-09 21:54:38,140 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-09 21:54:47,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=216880.0, ans=0.125 2024-08-09 21:54:54,667 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7200, loss[loss=0.1223, beats_loss=0.01422, ecapa_loss=0.0003452, whisper_loss=0.1047, over 22073.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01258, ecapa_loss=0.0003297, whisper_loss=0.1022, over 3893082.43 frames. ], batch size: 91, lr: 2.52e-02, grad_scale: 32768.0 2024-08-09 21:54:57,420 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.546e+01 3.192e+01 3.694e+01 4.293e+01 6.634e+01, threshold=7.388e+01, percent-clipped=1.0 2024-08-09 21:55:08,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=217080.0, ans=15.0 2024-08-09 21:55:18,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=217080.0, ans=10.0 2024-08-09 21:55:20,203 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 21:55:23,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=217180.0, ans=0.125 2024-08-09 21:55:30,946 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-09 21:55:34,155 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2024-08-09 21:55:40,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=217280.0, ans=0.1 2024-08-09 21:55:48,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=217380.0, ans=0.125 2024-08-09 21:55:53,446 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 21:56:00,740 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7250, loss[loss=0.1253, beats_loss=0.01195, ecapa_loss=0.0003102, whisper_loss=0.1103, over 18711.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01252, ecapa_loss=0.0003309, whisper_loss=0.1022, over 3913807.63 frames. ], batch size: 73, lr: 2.52e-02, grad_scale: 32768.0 2024-08-09 21:56:02,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=217480.0, ans=0.125 2024-08-09 21:56:05,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=217480.0, ans=0.0 2024-08-09 21:56:37,184 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-09 21:56:55,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217880.0, ans=0.1 2024-08-09 21:57:07,506 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7300, loss[loss=0.1381, beats_loss=0.01087, ecapa_loss=0.000321, whisper_loss=0.124, over 22940.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01256, ecapa_loss=0.0003307, whisper_loss=0.1028, over 3937156.80 frames. ], batch size: 90, lr: 2.52e-02, grad_scale: 32768.0 2024-08-09 21:57:10,460 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+01 3.021e+01 3.524e+01 4.153e+01 7.749e+01, threshold=7.049e+01, percent-clipped=1.0 2024-08-09 21:57:23,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=218080.0, ans=0.0 2024-08-09 21:57:32,313 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-09 21:57:39,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=218180.0, ans=0.2 2024-08-09 21:57:55,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=218280.0, ans=0.1 2024-08-09 21:58:06,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=218380.0, ans=0.125 2024-08-09 21:58:09,950 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 21:58:11,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=218380.0, ans=0.0 2024-08-09 21:58:12,714 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 14 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 21:58:15,146 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7350, loss[loss=0.1156, beats_loss=0.01419, ecapa_loss=0.0003923, whisper_loss=0.09752, over 20135.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01251, ecapa_loss=0.0003303, whisper_loss=0.1027, over 3913151.35 frames. ], batch size: 86, lr: 2.51e-02, grad_scale: 32768.0 2024-08-09 21:58:24,487 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-09 21:58:25,862 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-09 21:58:28,370 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-09 21:58:48,111 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-09 21:58:56,510 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-09 21:58:57,938 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 12 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-09 21:58:59,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=218780.0, ans=0.125 2024-08-09 21:59:11,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=218880.0, ans=0.125 2024-08-09 21:59:13,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=218880.0, ans=0.125 2024-08-09 21:59:15,907 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=15.0 2024-08-09 21:59:22,021 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7400, loss[loss=0.117, beats_loss=0.01514, ecapa_loss=0.0002731, whisper_loss=0.0991, over 22033.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.0126, ecapa_loss=0.0003294, whisper_loss=0.102, over 3918826.70 frames. ], batch size: 85, lr: 2.51e-02, grad_scale: 32768.0 2024-08-09 21:59:24,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 2.912e+01 3.245e+01 3.982e+01 7.444e+01, threshold=6.489e+01, percent-clipped=1.0 2024-08-09 21:59:31,835 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2024-08-09 21:59:39,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=219080.0, ans=0.125 2024-08-09 21:59:53,844 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 22:00:05,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=219280.0, ans=0.125 2024-08-09 22:00:07,913 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-09 22:00:11,078 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=26.76 vs. limit=22.5 2024-08-09 22:00:24,746 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 33 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 22:00:27,318 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7450, loss[loss=0.109, beats_loss=0.01065, ecapa_loss=0.0003707, whisper_loss=0.09463, over 13847.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01257, ecapa_loss=0.0003291, whisper_loss=0.102, over 3891622.09 frames. ], batch size: 54, lr: 2.51e-02, grad_scale: 32768.0 2024-08-09 22:00:27,569 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-09 22:00:35,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=219480.0, ans=0.125 2024-08-09 22:00:50,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=219580.0, ans=0.125 2024-08-09 22:00:52,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=219680.0, ans=0.125 2024-08-09 22:01:19,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=219880.0, ans=0.2 2024-08-09 22:01:23,849 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.96 vs. limit=22.5 2024-08-09 22:01:24,610 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 22:01:25,739 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-09 22:01:28,202 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 16 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-09 22:01:29,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=219880.0, ans=0.2 2024-08-09 22:01:31,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=219980.0, ans=0.125 2024-08-09 22:01:32,130 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7500, loss[loss=0.1154, beats_loss=0.01405, ecapa_loss=0.0002902, whisper_loss=0.09842, over 19419.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01261, ecapa_loss=0.0003294, whisper_loss=0.1017, over 3884982.80 frames. ], batch size: 77, lr: 2.51e-02, grad_scale: 32768.0 2024-08-09 22:01:34,772 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.511e+01 3.195e+01 3.556e+01 4.126e+01 6.406e+01, threshold=7.112e+01, percent-clipped=0.0 2024-08-09 22:01:41,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=219980.0, ans=0.125 2024-08-09 22:01:41,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=219980.0, ans=0.0 2024-08-09 22:01:43,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=219980.0, ans=0.2 2024-08-09 22:02:05,474 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 22:02:09,273 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-09 22:02:15,332 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.34 vs. limit=22.5 2024-08-09 22:02:21,969 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.33 vs. limit=10.0 2024-08-09 22:02:24,748 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.54 vs. limit=10.0 2024-08-09 22:02:26,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=220380.0, ans=0.125 2024-08-09 22:02:38,682 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7550, loss[loss=0.1146, beats_loss=0.01388, ecapa_loss=0.000314, whisper_loss=0.0976, over 18213.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01261, ecapa_loss=0.0003298, whisper_loss=0.102, over 3876249.62 frames. ], batch size: 74, lr: 2.50e-02, grad_scale: 65536.0 2024-08-09 22:02:50,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=220580.0, ans=0.0 2024-08-09 22:02:52,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=220580.0, ans=0.0 2024-08-09 22:02:56,686 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 22:03:01,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=220580.0, ans=0.125 2024-08-09 22:03:02,283 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-09 22:03:02,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=220580.0, ans=0.125 2024-08-09 22:03:06,832 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-09 22:03:18,035 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.96 vs. limit=15.0 2024-08-09 22:03:20,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=220780.0, ans=0.1 2024-08-09 22:03:22,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=220780.0, ans=0.1 2024-08-09 22:03:43,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7600, loss[loss=0.1051, beats_loss=0.01575, ecapa_loss=0.0003552, whisper_loss=0.08577, over 17283.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.0126, ecapa_loss=0.0003306, whisper_loss=0.1017, over 3874181.46 frames. ], batch size: 70, lr: 2.50e-02, grad_scale: 65536.0 2024-08-09 22:03:46,361 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.261e+01 2.898e+01 3.243e+01 3.786e+01 9.374e+01, threshold=6.487e+01, percent-clipped=2.0 2024-08-09 22:03:46,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=220980.0, ans=0.0 2024-08-09 22:04:02,128 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2024-08-09 22:04:15,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=221180.0, ans=0.125 2024-08-09 22:04:16,715 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 22:04:43,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=221380.0, ans=0.0 2024-08-09 22:04:48,863 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.74 vs. limit=15.0 2024-08-09 22:04:51,730 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7650, loss[loss=0.1137, beats_loss=0.01262, ecapa_loss=0.000354, whisper_loss=0.09756, over 18145.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01258, ecapa_loss=0.0003316, whisper_loss=0.1015, over 3869659.14 frames. ], batch size: 73, lr: 2.50e-02, grad_scale: 65536.0 2024-08-09 22:04:52,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=221480.0, ans=0.2 2024-08-09 22:04:53,030 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 22:05:01,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=221480.0, ans=0.04949747468305833 2024-08-09 22:05:03,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=221480.0, ans=0.0 2024-08-09 22:05:08,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=221580.0, ans=0.125 2024-08-09 22:05:08,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=221580.0, ans=0.0 2024-08-09 22:05:11,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=221580.0, ans=0.95 2024-08-09 22:05:15,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=221580.0, ans=0.1 2024-08-09 22:05:30,779 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2024-08-09 22:05:42,235 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.85 vs. limit=15.0 2024-08-09 22:05:50,986 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.865e+00 2024-08-09 22:06:13,020 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7700, loss[loss=0.123, beats_loss=0.01395, ecapa_loss=0.0002635, whisper_loss=0.1065, over 22889.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01255, ecapa_loss=0.0003311, whisper_loss=0.1013, over 3876112.63 frames. ], batch size: 89, lr: 2.50e-02, grad_scale: 65536.0 2024-08-09 22:06:15,929 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.092e+01 2.870e+01 3.289e+01 3.671e+01 6.131e+01, threshold=6.578e+01, percent-clipped=0.0 2024-08-09 22:06:22,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=221980.0, ans=0.0 2024-08-09 22:06:28,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=221980.0, ans=0.125 2024-08-09 22:06:32,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=222080.0, ans=0.2 2024-08-09 22:06:38,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=222080.0, ans=0.0 2024-08-09 22:06:41,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=222080.0, ans=10.0 2024-08-09 22:06:44,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=222080.0, ans=15.0 2024-08-09 22:07:19,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=222280.0, ans=0.125 2024-08-09 22:07:59,406 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7750, loss[loss=0.1306, beats_loss=0.0125, ecapa_loss=0.0003134, whisper_loss=0.115, over 19386.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01255, ecapa_loss=0.0003364, whisper_loss=0.1016, over 3895309.41 frames. ], batch size: 76, lr: 2.49e-02, grad_scale: 65536.0 2024-08-09 22:08:07,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=222480.0, ans=0.125 2024-08-09 22:08:10,677 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 22:08:14,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=222480.0, ans=0.125 2024-08-09 22:08:24,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=222580.0, ans=0.1 2024-08-09 22:08:55,201 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 22:09:16,390 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7800, loss[loss=0.1151, beats_loss=0.01047, ecapa_loss=0.0003723, whisper_loss=0.1009, over 14267.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01251, ecapa_loss=0.0003368, whisper_loss=0.1015, over 3861060.76 frames. ], batch size: 56, lr: 2.49e-02, grad_scale: 65536.0 2024-08-09 22:09:19,449 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.158e+01 3.196e+01 3.636e+01 4.618e+01 8.254e+01, threshold=7.273e+01, percent-clipped=2.0 2024-08-09 22:09:27,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=222980.0, ans=0.09899494936611666 2024-08-09 22:09:30,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=223080.0, ans=0.2 2024-08-09 22:09:46,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=223180.0, ans=0.125 2024-08-09 22:09:47,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=223180.0, ans=0.125 2024-08-09 22:09:56,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=223180.0, ans=0.125 2024-08-09 22:09:59,978 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.06 vs. limit=6.0 2024-08-09 22:10:04,817 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2024-08-09 22:10:10,653 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=15.0 2024-08-09 22:10:16,960 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2024-08-09 22:10:17,768 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 22:10:32,560 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7850, loss[loss=0.09659, beats_loss=0.01491, ecapa_loss=0.0002859, whisper_loss=0.07882, over 14301.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.0126, ecapa_loss=0.0003348, whisper_loss=0.101, over 3882721.72 frames. ], batch size: 56, lr: 2.49e-02, grad_scale: 65536.0 2024-08-09 22:10:58,117 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-09 22:11:01,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=223680.0, ans=0.07 2024-08-09 22:11:05,617 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 22:11:05,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=223680.0, ans=0.0 2024-08-09 22:11:11,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=223680.0, ans=0.1 2024-08-09 22:11:21,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=223780.0, ans=0.1 2024-08-09 22:11:29,768 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-09 22:11:37,181 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-09 22:11:41,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=223880.0, ans=0.2 2024-08-09 22:11:47,068 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7900, loss[loss=0.1375, beats_loss=0.01005, ecapa_loss=0.0003395, whisper_loss=0.124, over 14363.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01268, ecapa_loss=0.0003337, whisper_loss=0.1009, over 3852082.69 frames. ], batch size: 55, lr: 2.49e-02, grad_scale: 65536.0 2024-08-09 22:11:50,370 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.925e+01 3.324e+01 4.014e+01 6.320e+01, threshold=6.647e+01, percent-clipped=0.0 2024-08-09 22:11:56,202 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-09 22:12:16,448 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-09 22:12:25,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=224180.0, ans=0.0 2024-08-09 22:12:25,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=224180.0, ans=0.0 2024-08-09 22:12:30,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=224180.0, ans=0.125 2024-08-09 22:12:39,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=224280.0, ans=0.0 2024-08-09 22:12:45,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=224280.0, ans=0.125 2024-08-09 22:13:00,077 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.474e-01 2024-08-09 22:13:01,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=224380.0, ans=0.1 2024-08-09 22:13:06,391 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 7950, loss[loss=0.1237, beats_loss=0.01131, ecapa_loss=0.000346, whisper_loss=0.1089, over 18035.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01264, ecapa_loss=0.0003321, whisper_loss=0.1014, over 3851995.67 frames. ], batch size: 71, lr: 2.48e-02, grad_scale: 65536.0 2024-08-09 22:13:08,719 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.37 vs. limit=15.0 2024-08-09 22:13:13,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=224480.0, ans=0.125 2024-08-09 22:13:19,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=224580.0, ans=0.04949747468305833 2024-08-09 22:13:22,650 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 13 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-09 22:13:47,807 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-09 22:13:49,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=224780.0, ans=0.125 2024-08-09 22:13:59,182 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.41 vs. limit=22.5 2024-08-09 22:14:13,366 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-09 22:14:20,685 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8000, loss[loss=0.1076, beats_loss=0.01302, ecapa_loss=0.0003639, whisper_loss=0.09098, over 21591.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01259, ecapa_loss=0.0003305, whisper_loss=0.1016, over 3881127.23 frames. ], batch size: 91, lr: 2.48e-02, grad_scale: 65536.0 2024-08-09 22:14:23,606 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.98 vs. limit=12.0 2024-08-09 22:14:23,867 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 3.124e+01 3.387e+01 3.961e+01 6.094e+01, threshold=6.774e+01, percent-clipped=0.0 2024-08-09 22:14:27,085 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 22:14:36,041 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 22:14:37,304 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-09 22:14:48,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=225080.0, ans=0.125 2024-08-09 22:15:04,769 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 28 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 22:15:22,628 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=15.0 2024-08-09 22:15:35,243 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8050, loss[loss=0.1236, beats_loss=0.01173, ecapa_loss=0.0003715, whisper_loss=0.1082, over 22964.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01255, ecapa_loss=0.00033, whisper_loss=0.1017, over 3877532.78 frames. ], batch size: 95, lr: 2.48e-02, grad_scale: 65536.0 2024-08-09 22:15:40,495 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=15.0 2024-08-09 22:15:40,563 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.96 vs. limit=22.5 2024-08-09 22:15:47,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=225480.0, ans=0.0 2024-08-09 22:16:28,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=225780.0, ans=0.07 2024-08-09 22:16:45,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=225880.0, ans=0.1 2024-08-09 22:16:46,280 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-09 22:16:49,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=225980.0, ans=0.05 2024-08-09 22:16:50,606 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8100, loss[loss=0.1091, beats_loss=0.01545, ecapa_loss=0.0002531, whisper_loss=0.09115, over 23070.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.0126, ecapa_loss=0.0003271, whisper_loss=0.1012, over 3867535.18 frames. ], batch size: 92, lr: 2.48e-02, grad_scale: 65536.0 2024-08-09 22:16:53,627 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 2.949e+01 3.347e+01 3.946e+01 6.724e+01, threshold=6.694e+01, percent-clipped=0.0 2024-08-09 22:16:55,312 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-09 22:16:59,148 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=12.0 2024-08-09 22:17:09,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=226080.0, ans=0.1 2024-08-09 22:17:09,519 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.81 vs. limit=22.5 2024-08-09 22:17:30,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=226180.0, ans=0.125 2024-08-09 22:17:32,649 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-09 22:17:32,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=226180.0, ans=0.0 2024-08-09 22:17:45,347 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=15.0 2024-08-09 22:17:56,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=226380.0, ans=12.0 2024-08-09 22:18:02,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=226380.0, ans=0.07 2024-08-09 22:18:05,944 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8150, loss[loss=0.09818, beats_loss=0.01355, ecapa_loss=0.0003107, whisper_loss=0.08153, over 17344.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01257, ecapa_loss=0.0003297, whisper_loss=0.1009, over 3900204.83 frames. ], batch size: 67, lr: 2.47e-02, grad_scale: 65536.0 2024-08-09 22:18:09,294 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 23 from Vox, 17 fro AS 2024-08-09 22:18:19,762 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-09 22:18:21,317 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-09 22:18:23,274 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.03 vs. limit=15.0 2024-08-09 22:18:33,185 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 22:18:37,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=226680.0, ans=15.0 2024-08-09 22:18:45,170 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-09 22:18:50,660 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.39 vs. limit=22.5 2024-08-09 22:19:13,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=226880.0, ans=0.0 2024-08-09 22:19:23,130 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8200, loss[loss=0.1133, beats_loss=0.01161, ecapa_loss=0.0003198, whisper_loss=0.09854, over 16189.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01269, ecapa_loss=0.0003276, whisper_loss=0.101, over 3898074.29 frames. ], batch size: 62, lr: 2.47e-02, grad_scale: 65536.0 2024-08-09 22:19:25,742 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.391e+01 3.072e+01 3.518e+01 4.235e+01 6.207e+01, threshold=7.036e+01, percent-clipped=0.0 2024-08-09 22:19:30,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=226980.0, ans=0.125 2024-08-09 22:19:31,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=226980.0, ans=0.0 2024-08-09 22:19:37,567 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-09 22:19:40,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=227080.0, ans=0.125 2024-08-09 22:19:49,853 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2024-08-09 22:20:03,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=227180.0, ans=0.125 2024-08-09 22:20:04,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=227180.0, ans=0.0 2024-08-09 22:20:08,889 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-09 22:20:30,000 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 26 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-09 22:20:38,032 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-09 22:20:39,585 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8250, loss[loss=0.1024, beats_loss=0.009465, ecapa_loss=0.0003193, whisper_loss=0.08971, over 14156.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01272, ecapa_loss=0.0003283, whisper_loss=0.1004, over 3903831.70 frames. ], batch size: 54, lr: 2.47e-02, grad_scale: 65536.0 2024-08-09 22:20:45,809 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 33 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 22:20:46,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=227480.0, ans=0.05 2024-08-09 22:20:52,031 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-09 22:21:09,127 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-09 22:21:16,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=227680.0, ans=0.2 2024-08-09 22:21:33,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=227780.0, ans=0.125 2024-08-09 22:21:47,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=227880.0, ans=0.1 2024-08-09 22:21:56,264 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8300, loss[loss=0.1233, beats_loss=0.01161, ecapa_loss=0.0003353, whisper_loss=0.1083, over 19154.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01262, ecapa_loss=0.0003277, whisper_loss=0.1009, over 3893696.63 frames. ], batch size: 76, lr: 2.47e-02, grad_scale: 65536.0 2024-08-09 22:21:58,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=227980.0, ans=0.0 2024-08-09 22:21:59,104 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.129e+01 2.849e+01 3.182e+01 3.709e+01 5.211e+01, threshold=6.363e+01, percent-clipped=0.0 2024-08-09 22:22:00,505 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 22:22:24,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=228180.0, ans=0.125 2024-08-09 22:22:31,146 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 21 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-09 22:22:32,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=228180.0, ans=0.125 2024-08-09 22:22:36,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=228180.0, ans=0.125 2024-08-09 22:22:37,112 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-09 22:22:41,962 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-09 22:22:42,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=228280.0, ans=0.125 2024-08-09 22:22:57,355 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-09 22:23:01,338 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-09 22:23:07,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=228380.0, ans=0.125 2024-08-09 22:23:10,162 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8350, loss[loss=0.09074, beats_loss=0.01468, ecapa_loss=0.0002857, whisper_loss=0.0732, over 21667.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01277, ecapa_loss=0.000326, whisper_loss=0.1004, over 3918559.43 frames. ], batch size: 90, lr: 2.46e-02, grad_scale: 65536.0 2024-08-09 22:23:17,813 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-09 22:23:20,364 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 22:23:20,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=228480.0, ans=0.125 2024-08-09 22:23:24,631 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 16 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-09 22:23:25,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=228580.0, ans=0.0 2024-08-09 22:23:28,137 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 22:23:41,796 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-09 22:23:42,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=228680.0, ans=0.0 2024-08-09 22:23:51,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=228680.0, ans=0.5 2024-08-09 22:24:12,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=228880.0, ans=0.1 2024-08-09 22:24:26,815 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8400, loss[loss=0.1275, beats_loss=0.01136, ecapa_loss=0.0003214, whisper_loss=0.113, over 18598.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01283, ecapa_loss=0.0003254, whisper_loss=0.0999, over 3932241.88 frames. ], batch size: 74, lr: 2.46e-02, grad_scale: 65536.0 2024-08-09 22:24:29,559 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.962e+01 3.410e+01 4.213e+01 6.836e+01, threshold=6.819e+01, percent-clipped=3.0 2024-08-09 22:24:31,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=228980.0, ans=0.07 2024-08-09 22:24:34,269 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-09 22:24:41,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=229080.0, ans=0.0 2024-08-09 22:24:45,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=229080.0, ans=0.1 2024-08-09 22:24:50,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=229080.0, ans=0.125 2024-08-09 22:24:52,287 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.31 vs. limit=15.0 2024-08-09 22:24:58,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=229180.0, ans=0.125 2024-08-09 22:24:58,954 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 22:25:07,436 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.97 vs. limit=12.0 2024-08-09 22:25:13,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=229280.0, ans=0.04949747468305833 2024-08-09 22:25:14,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=229280.0, ans=0.125 2024-08-09 22:25:16,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=229280.0, ans=0.0 2024-08-09 22:25:18,876 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-09 22:25:42,067 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8450, loss[loss=0.0982, beats_loss=0.01615, ecapa_loss=0.0003056, whisper_loss=0.079, over 21752.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01279, ecapa_loss=0.0003257, whisper_loss=0.09959, over 3905876.75 frames. ], batch size: 91, lr: 2.46e-02, grad_scale: 65536.0 2024-08-09 22:26:00,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=229580.0, ans=0.2 2024-08-09 22:26:02,754 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 22:26:11,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=229680.0, ans=0.125 2024-08-09 22:26:34,182 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 22:26:36,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=229780.0, ans=0.0 2024-08-09 22:26:38,739 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=35.12 vs. limit=15.0 2024-08-09 22:26:55,491 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 22:26:58,013 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8500, loss[loss=0.1189, beats_loss=0.0125, ecapa_loss=0.0003641, whisper_loss=0.1028, over 16939.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01274, ecapa_loss=0.0003247, whisper_loss=0.1003, over 3910958.09 frames. ], batch size: 67, lr: 2.46e-02, grad_scale: 65536.0 2024-08-09 22:26:58,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=229980.0, ans=15.0 2024-08-09 22:27:00,846 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.472e+01 3.102e+01 3.448e+01 4.001e+01 5.719e+01, threshold=6.896e+01, percent-clipped=0.0 2024-08-09 22:27:04,707 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2024-08-09 22:27:11,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=230080.0, ans=0.0 2024-08-09 22:27:33,210 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.27 vs. limit=10.0 2024-08-09 22:27:37,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=230180.0, ans=0.125 2024-08-09 22:27:43,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=230280.0, ans=0.1 2024-08-09 22:27:53,847 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-09 22:27:55,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=230280.0, ans=0.125 2024-08-09 22:28:12,707 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8550, loss[loss=0.09899, beats_loss=0.011, ecapa_loss=0.0003889, whisper_loss=0.08409, over 13518.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01266, ecapa_loss=0.0003259, whisper_loss=0.1006, over 3921657.35 frames. ], batch size: 55, lr: 2.45e-02, grad_scale: 65536.0 2024-08-09 22:28:12,961 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-09 22:28:47,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=230680.0, ans=0.125 2024-08-09 22:28:54,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=230680.0, ans=0.02 2024-08-09 22:29:07,503 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-09 22:29:12,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=230880.0, ans=0.125 2024-08-09 22:29:24,287 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.65 vs. limit=22.5 2024-08-09 22:29:26,308 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8600, loss[loss=0.1272, beats_loss=0.01311, ecapa_loss=0.0002876, whisper_loss=0.1112, over 16698.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01259, ecapa_loss=0.0003243, whisper_loss=0.1017, over 3899980.08 frames. ], batch size: 65, lr: 2.45e-02, grad_scale: 65536.0 2024-08-09 22:29:29,296 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.240e+01 2.896e+01 3.419e+01 4.251e+01 8.504e+01, threshold=6.839e+01, percent-clipped=1.0 2024-08-09 22:29:31,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=230980.0, ans=0.125 2024-08-09 22:29:33,740 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-09 22:29:34,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=230980.0, ans=0.125 2024-08-09 22:29:56,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=231180.0, ans=0.125 2024-08-09 22:30:12,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=231280.0, ans=0.07 2024-08-09 22:30:14,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=231280.0, ans=0.0 2024-08-09 22:30:15,848 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 22 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-09 22:30:17,151 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-09 22:30:29,677 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.08 vs. limit=15.0 2024-08-09 22:30:38,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=231480.0, ans=0.125 2024-08-09 22:30:39,388 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8650, loss[loss=0.1208, beats_loss=0.01247, ecapa_loss=0.0002479, whisper_loss=0.1059, over 16228.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01262, ecapa_loss=0.0003246, whisper_loss=0.1022, over 3909504.26 frames. ], batch size: 59, lr: 2.45e-02, grad_scale: 65536.0 2024-08-09 22:30:39,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=231480.0, ans=0.125 2024-08-09 22:30:42,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=231480.0, ans=0.125 2024-08-09 22:31:16,720 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 22:31:23,249 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.70 vs. limit=15.0 2024-08-09 22:31:26,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231780.0, ans=0.1 2024-08-09 22:31:44,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=231880.0, ans=0.0 2024-08-09 22:31:51,615 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8700, loss[loss=0.125, beats_loss=0.01286, ecapa_loss=0.0003948, whisper_loss=0.1082, over 15286.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01267, ecapa_loss=0.0003255, whisper_loss=0.1018, over 3894306.15 frames. ], batch size: 65, lr: 2.45e-02, grad_scale: 65536.0 2024-08-09 22:31:53,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=231980.0, ans=0.125 2024-08-09 22:31:54,366 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 3.097e+01 3.569e+01 4.188e+01 5.734e+01, threshold=7.139e+01, percent-clipped=0.0 2024-08-09 22:31:57,210 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2024-08-09 22:32:07,906 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 22:32:24,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=232180.0, ans=0.0 2024-08-09 22:32:37,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=232280.0, ans=0.0 2024-08-09 22:32:50,524 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.09 vs. limit=22.5 2024-08-09 22:33:07,482 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8750, loss[loss=0.1468, beats_loss=0.007852, ecapa_loss=0.0002934, whisper_loss=0.136, over 17433.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01268, ecapa_loss=0.0003236, whisper_loss=0.1015, over 3883541.51 frames. ], batch size: 61, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:33:14,471 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=12.0 2024-08-09 22:33:33,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=232580.0, ans=0.0 2024-08-09 22:33:40,882 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-09 22:33:49,597 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 22:33:50,184 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.36 vs. limit=22.5 2024-08-09 22:34:00,178 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 22:34:00,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=232780.0, ans=0.2 2024-08-09 22:34:06,601 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.68 vs. limit=22.5 2024-08-09 22:34:16,169 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 22:34:19,989 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8800, loss[loss=0.1237, beats_loss=0.01311, ecapa_loss=0.0002701, whisper_loss=0.1079, over 20698.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01268, ecapa_loss=0.0003234, whisper_loss=0.1019, over 3885653.96 frames. ], batch size: 80, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:34:23,310 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.313e+01 3.102e+01 3.612e+01 4.206e+01 6.577e+01, threshold=7.224e+01, percent-clipped=0.0 2024-08-09 22:34:33,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=232980.0, ans=0.125 2024-08-09 22:34:40,286 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-09 22:34:45,190 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2024-08-09 22:34:49,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=233180.0, ans=0.0 2024-08-09 22:35:02,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=233180.0, ans=0.125 2024-08-09 22:35:16,476 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.41 vs. limit=22.5 2024-08-09 22:35:23,513 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-09 22:35:23,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=233380.0, ans=0.0 2024-08-09 22:35:28,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=233380.0, ans=0.1 2024-08-09 22:35:34,807 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8850, loss[loss=0.113, beats_loss=0.01274, ecapa_loss=0.0003359, whisper_loss=0.09689, over 23013.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01268, ecapa_loss=0.0003236, whisper_loss=0.1016, over 3897705.03 frames. ], batch size: 95, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:35:39,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=233480.0, ans=0.125 2024-08-09 22:35:42,329 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 40 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 22:35:43,097 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.36 vs. limit=22.5 2024-08-09 22:35:57,646 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 35 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 22:35:59,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=233580.0, ans=0.5 2024-08-09 22:36:03,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=233680.0, ans=0.0 2024-08-09 22:36:05,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=233680.0, ans=0.125 2024-08-09 22:36:37,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=233880.0, ans=0.2 2024-08-09 22:36:45,043 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8900, loss[loss=0.1109, beats_loss=0.01191, ecapa_loss=0.0002422, whisper_loss=0.09661, over 16782.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01265, ecapa_loss=0.0003231, whisper_loss=0.1011, over 3871811.06 frames. ], batch size: 60, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:36:45,878 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.08 vs. limit=22.5 2024-08-09 22:36:47,876 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.807e+01 3.249e+01 3.699e+01 6.208e+01, threshold=6.498e+01, percent-clipped=0.0 2024-08-09 22:36:49,454 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 8 from Vox, 33 fro AS 2024-08-09 22:36:56,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=233980.0, ans=0.125 2024-08-09 22:37:06,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=234080.0, ans=0.0 2024-08-09 22:37:13,816 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-09 22:37:25,785 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 22:37:27,599 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-08-09 22:37:28,783 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.84 vs. limit=15.0 2024-08-09 22:37:29,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=234280.0, ans=0.0 2024-08-09 22:37:33,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.02 vs. limit=15.0 2024-08-09 22:37:35,938 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2024-08-09 22:37:38,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=234280.0, ans=0.0 2024-08-09 22:37:49,148 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-08-09 22:37:56,102 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 8950, loss[loss=0.1236, beats_loss=0.0115, ecapa_loss=0.0004008, whisper_loss=0.1081, over 22157.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01275, ecapa_loss=0.00032, whisper_loss=0.1007, over 3858331.44 frames. ], batch size: 95, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:37:56,253 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-09 22:38:19,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=234580.0, ans=0.0 2024-08-09 22:38:33,178 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-09 22:38:41,930 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.68 vs. limit=6.0 2024-08-09 22:38:44,957 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2024-08-09 22:38:48,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=234780.0, ans=0.125 2024-08-09 22:38:57,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=234880.0, ans=0.125 2024-08-09 22:39:04,958 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9000, loss[loss=0.084, beats_loss=0.01292, ecapa_loss=0.0003931, whisper_loss=0.06715, over 17914.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.0127, ecapa_loss=0.0003238, whisper_loss=0.1009, over 3884906.17 frames. ], batch size: 77, lr: 2.43e-02, grad_scale: 65536.0 2024-08-09 22:39:04,959 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-09 22:39:43,695 INFO [train_multi_KD3.py:1149] (2/4) Epoch 2, validation on ASR_libri: loss=0.2806, beats_loss=0, ecapa_loss=0.0009572, whisper_loss=0.2711, over 922467.00 frames. 2024-08-09 22:40:01,257 INFO [train_multi_KD3.py:1149] (2/4) Epoch 2, validation on SV_voxceleb1: loss=0.008746, beats_loss=0, ecapa_loss=0.0008746, whisper_loss=0, over 939242.00 frames. 2024-08-09 22:40:57,573 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.6702, 1.7408, 2.4630, 1.8968], device='cuda:2') 2024-08-09 22:41:51,956 INFO [train_multi_KD3.py:1149] (2/4) Epoch 2, validation on AT_audioset: loss=0.02976, beats_loss=0.02976, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 22:41:51,960 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-09 22:41:54,366 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 3.054e+01 3.477e+01 3.947e+01 5.844e+01, threshold=6.953e+01, percent-clipped=0.0 2024-08-09 22:41:54,618 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-09 22:42:02,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=234980.0, ans=0.0 2024-08-09 22:42:20,233 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.203e-03 2024-08-09 22:42:28,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=235180.0, ans=0.0 2024-08-09 22:42:29,238 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.82 vs. limit=15.0 2024-08-09 22:42:32,913 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-09 22:42:57,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=235380.0, ans=0.125 2024-08-09 22:43:04,050 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9050, loss[loss=0.1036, beats_loss=0.01424, ecapa_loss=0.0003255, whisper_loss=0.08611, over 16430.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.0126, ecapa_loss=0.0003239, whisper_loss=0.1018, over 3890134.27 frames. ], batch size: 64, lr: 2.43e-02, grad_scale: 65536.0 2024-08-09 22:43:32,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=235580.0, ans=0.2 2024-08-09 22:43:34,147 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.44 vs. limit=15.0 2024-08-09 22:43:40,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=235680.0, ans=0.1 2024-08-09 22:43:43,549 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.48 vs. limit=12.0 2024-08-09 22:44:00,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=235780.0, ans=0.0 2024-08-09 22:44:09,873 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-09 22:44:16,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=235980.0, ans=0.0 2024-08-09 22:44:17,555 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9100, loss[loss=0.1052, beats_loss=0.01545, ecapa_loss=0.0002647, whisper_loss=0.08713, over 23627.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01257, ecapa_loss=0.0003264, whisper_loss=0.1019, over 3899539.08 frames. ], batch size: 95, lr: 2.43e-02, grad_scale: 65536.0 2024-08-09 22:44:20,403 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.942e+01 3.415e+01 3.847e+01 6.703e+01, threshold=6.829e+01, percent-clipped=0.0 2024-08-09 22:44:20,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=235980.0, ans=0.2 2024-08-09 22:44:21,947 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 22:44:22,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=235980.0, ans=0.125 2024-08-09 22:44:34,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=236080.0, ans=0.2 2024-08-09 22:44:40,305 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.82 vs. limit=15.0 2024-08-09 22:44:42,315 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 27 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-09 22:45:03,928 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 22:45:15,844 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-09 22:45:31,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=236380.0, ans=0.125 2024-08-09 22:45:34,885 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9150, loss[loss=0.1193, beats_loss=0.01285, ecapa_loss=0.0003161, whisper_loss=0.1033, over 22343.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01261, ecapa_loss=0.0003252, whisper_loss=0.1017, over 3908223.71 frames. ], batch size: 89, lr: 2.43e-02, grad_scale: 65536.0 2024-08-09 22:45:35,752 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-09 22:45:36,465 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 22:45:43,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=236480.0, ans=0.0 2024-08-09 22:45:45,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=236480.0, ans=0.0 2024-08-09 22:45:51,188 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.800e-02 2024-08-09 22:45:53,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=236580.0, ans=0.125 2024-08-09 22:46:26,011 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 20 from LS+wenet, 31 from Vox, 43 fro AS 2024-08-09 22:46:42,612 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-09 22:46:48,544 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9200, loss[loss=0.114, beats_loss=0.01228, ecapa_loss=0.0003013, whisper_loss=0.09873, over 17770.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01261, ecapa_loss=0.000326, whisper_loss=0.1016, over 3905243.51 frames. ], batch size: 68, lr: 2.42e-02, grad_scale: 65536.0 2024-08-09 22:46:51,881 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 2.835e+01 3.303e+01 3.887e+01 6.132e+01, threshold=6.605e+01, percent-clipped=0.0 2024-08-09 22:46:57,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=236980.0, ans=10.0 2024-08-09 22:47:00,397 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-09 22:47:03,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=237080.0, ans=0.1 2024-08-09 22:47:10,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=237080.0, ans=0.125 2024-08-09 22:48:04,355 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9250, loss[loss=0.09721, beats_loss=0.0161, ecapa_loss=0.0003474, whisper_loss=0.07764, over 21783.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01274, ecapa_loss=0.0003248, whisper_loss=0.1006, over 3881351.69 frames. ], batch size: 93, lr: 2.42e-02, grad_scale: 65536.0 2024-08-09 22:48:17,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=237580.0, ans=0.0 2024-08-09 22:48:32,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237580.0, ans=0.1 2024-08-09 22:48:43,377 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=1.86 vs. limit=15.0 2024-08-09 22:48:47,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=237680.0, ans=0.1 2024-08-09 22:49:19,421 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-09 22:49:20,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=237880.0, ans=0.125 2024-08-09 22:49:25,004 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9300, loss[loss=0.1071, beats_loss=0.0151, ecapa_loss=0.0002561, whisper_loss=0.08949, over 20472.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01269, ecapa_loss=0.0003244, whisper_loss=0.1006, over 3907931.40 frames. ], batch size: 79, lr: 2.42e-02, grad_scale: 65536.0 2024-08-09 22:49:27,792 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.295e+01 3.042e+01 3.380e+01 4.213e+01 8.159e+01, threshold=6.761e+01, percent-clipped=3.0 2024-08-09 22:49:33,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=237980.0, ans=0.125 2024-08-09 22:49:40,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=238080.0, ans=0.1 2024-08-09 22:49:56,120 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 31 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-09 22:49:56,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=238180.0, ans=0.125 2024-08-09 22:50:03,139 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-09 22:50:07,728 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 17 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 22:50:08,324 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2024-08-09 22:50:09,295 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 22:50:24,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=238380.0, ans=0.125 2024-08-09 22:50:35,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=238380.0, ans=0.0 2024-08-09 22:50:41,272 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9350, loss[loss=0.09806, beats_loss=0.01381, ecapa_loss=0.0003679, whisper_loss=0.08058, over 21784.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01274, ecapa_loss=0.0003252, whisper_loss=0.1001, over 3887545.90 frames. ], batch size: 93, lr: 2.42e-02, grad_scale: 65536.0 2024-08-09 22:50:41,457 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-09 22:50:41,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=238480.0, ans=0.2 2024-08-09 22:50:59,390 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-09 22:51:03,751 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 16 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-09 22:51:32,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=238580.0, ans=0.2 2024-08-09 22:51:35,682 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-09 22:51:36,456 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=12.0 2024-08-09 22:51:52,261 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-09 22:51:56,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=238680.0, ans=0.0 2024-08-09 22:52:15,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=238780.0, ans=0.0 2024-08-09 22:52:21,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=238880.0, ans=0.125 2024-08-09 22:52:29,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=238880.0, ans=0.2 2024-08-09 22:52:34,433 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9400, loss[loss=0.1153, beats_loss=0.01264, ecapa_loss=0.0003247, whisper_loss=0.09937, over 15832.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01279, ecapa_loss=0.000324, whisper_loss=0.1003, over 3865050.50 frames. ], batch size: 63, lr: 2.41e-02, grad_scale: 65536.0 2024-08-09 22:52:37,849 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+01 2.975e+01 3.274e+01 3.809e+01 6.351e+01, threshold=6.548e+01, percent-clipped=0.0 2024-08-09 22:52:44,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=238980.0, ans=0.0 2024-08-09 22:52:54,369 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 33 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-09 22:53:21,364 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 22:53:38,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=239280.0, ans=0.0 2024-08-09 22:53:39,093 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.91 vs. limit=15.0 2024-08-09 22:53:54,699 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-09 22:54:01,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=239380.0, ans=0.0 2024-08-09 22:54:06,880 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9450, loss[loss=0.129, beats_loss=0.0126, ecapa_loss=0.0003547, whisper_loss=0.1128, over 19043.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01279, ecapa_loss=0.0003257, whisper_loss=0.1006, over 3854338.94 frames. ], batch size: 77, lr: 2.41e-02, grad_scale: 65536.0 2024-08-09 22:54:11,849 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 22:54:32,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=239580.0, ans=0.0 2024-08-09 22:54:58,197 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 21 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-09 22:55:09,962 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2024-08-09 22:55:16,986 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-09 22:55:35,115 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-09 22:55:40,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=239880.0, ans=12.0 2024-08-09 22:55:44,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=239880.0, ans=0.125 2024-08-09 22:55:51,681 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9500, loss[loss=0.1191, beats_loss=0.0138, ecapa_loss=0.0003008, whisper_loss=0.1023, over 22973.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01279, ecapa_loss=0.0003235, whisper_loss=0.1006, over 3869979.39 frames. ], batch size: 92, lr: 2.41e-02, grad_scale: 65536.0 2024-08-09 22:55:59,370 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.955e+01 3.513e+01 3.972e+01 7.065e+01, threshold=7.026e+01, percent-clipped=1.0 2024-08-09 22:56:32,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=240080.0, ans=0.025 2024-08-09 22:56:34,332 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 22:56:34,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=240080.0, ans=0.125 2024-08-09 22:57:09,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=240280.0, ans=0.125 2024-08-09 22:57:19,293 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-09 22:57:44,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=240380.0, ans=0.125 2024-08-09 22:57:50,832 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9550, loss[loss=0.1061, beats_loss=0.01204, ecapa_loss=0.0003051, whisper_loss=0.091, over 16321.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01274, ecapa_loss=0.0003258, whisper_loss=0.1004, over 3854843.44 frames. ], batch size: 64, lr: 2.41e-02, grad_scale: 131072.0 2024-08-09 22:57:51,341 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.780e-01 2024-08-09 22:58:09,864 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 24 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-09 22:58:26,532 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.78 vs. limit=15.0 2024-08-09 22:58:30,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=240580.0, ans=0.0 2024-08-09 22:58:43,864 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-09 22:58:45,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=240680.0, ans=0.125 2024-08-09 22:58:45,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=240680.0, ans=0.0 2024-08-09 22:59:02,177 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-09 22:59:10,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=240780.0, ans=0.1 2024-08-09 22:59:20,621 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.20 vs. limit=22.5 2024-08-09 22:59:40,362 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.40 vs. limit=22.5 2024-08-09 22:59:45,610 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-09 22:59:45,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=240980.0, ans=0.0 2024-08-09 22:59:46,628 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9600, loss[loss=0.1395, beats_loss=0.01026, ecapa_loss=0.0004532, whisper_loss=0.1247, over 19383.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01262, ecapa_loss=0.0003266, whisper_loss=0.1011, over 3843285.47 frames. ], batch size: 80, lr: 2.41e-02, grad_scale: 131072.0 2024-08-09 22:59:49,809 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.841e+01 3.249e+01 3.780e+01 5.366e+01, threshold=6.497e+01, percent-clipped=0.0 2024-08-09 23:00:00,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=240980.0, ans=0.025 2024-08-09 23:00:25,440 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 17 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-09 23:00:31,943 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 23:00:51,063 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 23:01:18,339 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2024-08-09 23:01:30,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=241380.0, ans=0.2 2024-08-09 23:01:33,894 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9650, loss[loss=0.1025, beats_loss=0.01174, ecapa_loss=0.0003069, whisper_loss=0.08771, over 16066.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01258, ecapa_loss=0.0003269, whisper_loss=0.1001, over 3816277.52 frames. ], batch size: 62, lr: 2.40e-02, grad_scale: 131072.0 2024-08-09 23:01:37,074 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 10 from Vox, 39 fro AS 2024-08-09 23:01:39,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=241480.0, ans=0.07 2024-08-09 23:01:50,089 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-09 23:01:55,073 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 37 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-09 23:02:00,837 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 35 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 23:02:14,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=241680.0, ans=0.95 2024-08-09 23:02:27,979 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-09 23:02:47,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=241880.0, ans=0.1 2024-08-09 23:02:54,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=241880.0, ans=0.0 2024-08-09 23:02:58,016 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9700, loss[loss=0.08016, beats_loss=0.01683, ecapa_loss=0.0002323, whisper_loss=0.061, over 14074.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01264, ecapa_loss=0.000325, whisper_loss=0.1004, over 3854580.84 frames. ], batch size: 54, lr: 2.40e-02, grad_scale: 131072.0 2024-08-09 23:03:01,524 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 3.064e+01 3.484e+01 4.019e+01 6.587e+01, threshold=6.968e+01, percent-clipped=2.0 2024-08-09 23:03:08,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=241980.0, ans=0.125 2024-08-09 23:03:14,381 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-09 23:03:41,987 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-09 23:03:58,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=242280.0, ans=0.125 2024-08-09 23:04:00,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=242280.0, ans=0.1 2024-08-09 23:04:07,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=242380.0, ans=0.0 2024-08-09 23:04:21,716 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9750, loss[loss=0.0999, beats_loss=0.01484, ecapa_loss=0.0002454, whisper_loss=0.0826, over 22097.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01262, ecapa_loss=0.0003231, whisper_loss=0.1009, over 3877026.28 frames. ], batch size: 91, lr: 2.40e-02, grad_scale: 131072.0 2024-08-09 23:04:30,210 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 23:04:41,936 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 23:05:04,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=242680.0, ans=0.0 2024-08-09 23:05:05,425 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-09 23:05:09,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=242780.0, ans=0.125 2024-08-09 23:05:25,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=242880.0, ans=0.125 2024-08-09 23:05:27,727 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-09 23:05:41,909 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9800, loss[loss=0.1058, beats_loss=0.01304, ecapa_loss=0.0003384, whisper_loss=0.08939, over 21670.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01263, ecapa_loss=0.0003215, whisper_loss=0.101, over 3864845.33 frames. ], batch size: 89, lr: 2.40e-02, grad_scale: 131072.0 2024-08-09 23:05:42,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=242980.0, ans=0.09899494936611666 2024-08-09 23:05:43,410 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 27 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 23:05:44,406 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.875e+01 3.358e+01 3.972e+01 6.084e+01, threshold=6.716e+01, percent-clipped=0.0 2024-08-09 23:05:49,720 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.45 vs. limit=15.0 2024-08-09 23:05:54,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.08 vs. limit=15.0 2024-08-09 23:06:14,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=243180.0, ans=0.07 2024-08-09 23:06:19,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=243180.0, ans=0.05 2024-08-09 23:06:38,788 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=19.67 vs. limit=15.0 2024-08-09 23:06:38,967 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.44 vs. limit=15.0 2024-08-09 23:06:42,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=243280.0, ans=0.0 2024-08-09 23:06:43,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=243280.0, ans=0.025 2024-08-09 23:06:50,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=243380.0, ans=0.0 2024-08-09 23:07:03,112 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=28.98 vs. limit=22.5 2024-08-09 23:07:05,443 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9850, loss[loss=0.1249, beats_loss=0.01128, ecapa_loss=0.0003855, whisper_loss=0.1098, over 15809.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01263, ecapa_loss=0.0003202, whisper_loss=0.1012, over 3871737.98 frames. ], batch size: 66, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:07:20,023 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-09 23:07:30,590 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-09 23:08:02,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=243780.0, ans=0.125 2024-08-09 23:08:11,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=243780.0, ans=0.2 2024-08-09 23:08:33,679 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9900, loss[loss=0.1352, beats_loss=0.01381, ecapa_loss=0.0002701, whisper_loss=0.1187, over 23105.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01268, ecapa_loss=0.0003188, whisper_loss=0.101, over 3880428.91 frames. ], batch size: 90, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:08:36,602 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 3.025e+01 3.445e+01 3.906e+01 6.336e+01, threshold=6.890e+01, percent-clipped=0.0 2024-08-09 23:08:37,781 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2024-08-09 23:08:40,331 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 23:08:41,188 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.55 vs. limit=22.5 2024-08-09 23:08:53,211 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-09 23:09:01,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=244080.0, ans=0.2 2024-08-09 23:09:12,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=244180.0, ans=0.125 2024-08-09 23:09:28,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=244280.0, ans=0.125 2024-08-09 23:09:36,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=244280.0, ans=0.125 2024-08-09 23:09:42,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=244380.0, ans=0.0 2024-08-09 23:09:44,798 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=12.0 2024-08-09 23:09:53,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=244480.0, ans=0.2 2024-08-09 23:09:55,063 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 9950, loss[loss=0.1254, beats_loss=0.01326, ecapa_loss=0.0002993, whisper_loss=0.1092, over 23562.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.0126, ecapa_loss=0.0003201, whisper_loss=0.101, over 3878092.00 frames. ], batch size: 90, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:10:14,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=244580.0, ans=0.0 2024-08-09 23:10:22,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=244580.0, ans=0.2 2024-08-09 23:11:18,706 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10000, loss[loss=0.1329, beats_loss=0.01157, ecapa_loss=0.0003511, whisper_loss=0.1178, over 19949.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01246, ecapa_loss=0.000324, whisper_loss=0.1017, over 3881624.58 frames. ], batch size: 81, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:11:22,326 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.876e+01 3.207e+01 3.745e+01 5.513e+01, threshold=6.413e+01, percent-clipped=0.0 2024-08-09 23:11:24,470 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 23:11:26,248 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-09 23:11:35,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=245080.0, ans=0.035 2024-08-09 23:11:47,135 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-09 23:11:48,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=245080.0, ans=0.0 2024-08-09 23:11:58,300 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 23:12:00,286 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-09 23:12:11,403 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-09 23:12:15,760 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 19 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-09 23:12:36,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=245380.0, ans=0.1 2024-08-09 23:12:50,728 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10050, loss[loss=0.1152, beats_loss=0.0133, ecapa_loss=0.0002509, whisper_loss=0.09938, over 16102.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01246, ecapa_loss=0.0003219, whisper_loss=0.1012, over 3860617.39 frames. ], batch size: 60, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:12:51,583 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.98 vs. limit=15.0 2024-08-09 23:12:57,766 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-09 23:13:06,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=245480.0, ans=15.0 2024-08-09 23:13:08,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=245580.0, ans=0.125 2024-08-09 23:13:11,610 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-09 23:13:12,968 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-09 23:13:21,260 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.31 vs. limit=15.0 2024-08-09 23:13:28,426 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 23:13:28,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=245680.0, ans=0.125 2024-08-09 23:13:34,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=245680.0, ans=0.2 2024-08-09 23:13:50,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=245780.0, ans=0.125 2024-08-09 23:13:53,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=245780.0, ans=0.125 2024-08-09 23:13:54,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=12.0 2024-08-09 23:13:56,107 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-09 23:14:12,700 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 23 from Vox, 17 fro AS 2024-08-09 23:14:24,037 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10100, loss[loss=0.1101, beats_loss=0.0124, ecapa_loss=0.0004011, whisper_loss=0.0937, over 17427.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01249, ecapa_loss=0.0003238, whisper_loss=0.1012, over 3907054.08 frames. ], batch size: 71, lr: 2.38e-02, grad_scale: 131072.0 2024-08-09 23:14:28,114 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+01 2.998e+01 3.344e+01 3.820e+01 6.746e+01, threshold=6.687e+01, percent-clipped=3.0 2024-08-09 23:14:38,084 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2024-08-09 23:14:39,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=246080.0, ans=0.125 2024-08-09 23:14:41,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=246080.0, ans=0.125 2024-08-09 23:15:06,882 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-09 23:15:19,795 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-09 23:15:28,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=246380.0, ans=0.0 2024-08-09 23:15:29,910 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-09 23:15:35,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=246380.0, ans=0.125 2024-08-09 23:15:36,939 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 33 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 23:15:43,283 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10150, loss[loss=0.1103, beats_loss=0.01107, ecapa_loss=0.0003502, whisper_loss=0.0957, over 15338.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01258, ecapa_loss=0.000323, whisper_loss=0.1004, over 3910038.02 frames. ], batch size: 61, lr: 2.38e-02, grad_scale: 131072.0 2024-08-09 23:16:09,851 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-09 23:16:12,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=246680.0, ans=0.0 2024-08-09 23:16:19,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=246680.0, ans=0.125 2024-08-09 23:16:20,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=246680.0, ans=0.1 2024-08-09 23:16:27,306 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-09 23:16:32,841 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-09 23:16:57,859 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10200, loss[loss=0.1111, beats_loss=0.01407, ecapa_loss=0.0002929, whisper_loss=0.09409, over 21196.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01261, ecapa_loss=0.0003216, whisper_loss=0.1011, over 3932329.31 frames. ], batch size: 85, lr: 2.38e-02, grad_scale: 131072.0 2024-08-09 23:17:00,475 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.259e+01 2.913e+01 3.327e+01 3.843e+01 5.703e+01, threshold=6.654e+01, percent-clipped=0.0 2024-08-09 23:17:29,935 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-09 23:17:34,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=247180.0, ans=0.125 2024-08-09 23:17:37,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=247180.0, ans=0.0 2024-08-09 23:17:37,542 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2024-08-09 23:17:39,546 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-09 23:17:56,082 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-09 23:18:09,935 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10250, loss[loss=0.1105, beats_loss=0.01037, ecapa_loss=0.0003447, whisper_loss=0.09665, over 13853.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01264, ecapa_loss=0.0003211, whisper_loss=0.1009, over 3912870.90 frames. ], batch size: 54, lr: 2.38e-02, grad_scale: 131072.0 2024-08-09 23:18:25,465 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-09 23:18:31,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=247580.0, ans=0.125 2024-08-09 23:18:45,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=247680.0, ans=0.0 2024-08-09 23:18:58,114 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2024-08-09 23:19:10,202 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-09 23:19:14,622 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2024-08-09 23:19:18,319 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.232e+00 2024-08-09 23:19:21,674 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10300, loss[loss=0.1131, beats_loss=0.01541, ecapa_loss=0.0003174, whisper_loss=0.09455, over 21910.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01261, ecapa_loss=0.0003215, whisper_loss=0.1016, over 3935336.16 frames. ], batch size: 91, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:19:22,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=247980.0, ans=0.2 2024-08-09 23:19:25,145 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.466e+01 3.179e+01 3.546e+01 4.118e+01 7.373e+01, threshold=7.091e+01, percent-clipped=1.0 2024-08-09 23:19:30,208 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.103e+04 2024-08-09 23:19:35,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=248080.0, ans=0.07 2024-08-09 23:19:50,696 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-09 23:19:52,317 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-09 23:20:02,599 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 23:20:07,910 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-09 23:20:11,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=248280.0, ans=0.1 2024-08-09 23:20:19,141 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 23:20:34,311 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10350, loss[loss=0.1068, beats_loss=0.01419, ecapa_loss=0.0002431, whisper_loss=0.09016, over 19548.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01268, ecapa_loss=0.0003207, whisper_loss=0.1019, over 3923058.54 frames. ], batch size: 74, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:20:37,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=248480.0, ans=0.125 2024-08-09 23:20:43,817 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2024-08-09 23:20:52,495 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.35 vs. limit=22.5 2024-08-09 23:20:59,163 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2024-08-09 23:21:06,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=248680.0, ans=0.1 2024-08-09 23:21:12,160 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-09 23:21:23,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=248780.0, ans=0.125 2024-08-09 23:21:34,345 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2024-08-09 23:21:36,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=248880.0, ans=0.125 2024-08-09 23:21:46,250 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10400, loss[loss=0.1029, beats_loss=0.01489, ecapa_loss=0.000254, whisper_loss=0.08551, over 23165.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.01262, ecapa_loss=0.0003189, whisper_loss=0.1025, over 3950052.18 frames. ], batch size: 93, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:21:48,820 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.398e+01 2.757e+01 3.226e+01 3.794e+01 6.112e+01, threshold=6.451e+01, percent-clipped=0.0 2024-08-09 23:22:05,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=249080.0, ans=0.125 2024-08-09 23:22:08,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=249080.0, ans=0.125 2024-08-09 23:22:16,819 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 23:22:20,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=249180.0, ans=0.0 2024-08-09 23:22:23,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=249180.0, ans=0.2 2024-08-09 23:22:23,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=249180.0, ans=0.1 2024-08-09 23:22:39,747 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=12.0 2024-08-09 23:22:42,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=249380.0, ans=0.0 2024-08-09 23:22:44,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=249380.0, ans=0.125 2024-08-09 23:22:48,824 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.02 vs. limit=10.0 2024-08-09 23:22:52,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=249380.0, ans=0.125 2024-08-09 23:22:52,505 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=15.0 2024-08-09 23:22:54,650 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10450, loss[loss=0.1459, beats_loss=0.01263, ecapa_loss=0.0002824, whisper_loss=0.1304, over 23630.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01256, ecapa_loss=0.0003183, whisper_loss=0.1026, over 3947330.57 frames. ], batch size: 92, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:23:00,510 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 11 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-09 23:23:00,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=249480.0, ans=0.1 2024-08-09 23:23:29,669 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.94 vs. limit=15.0 2024-08-09 23:23:30,436 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-09 23:23:40,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=249780.0, ans=0.1 2024-08-09 23:23:41,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=249780.0, ans=0.125 2024-08-09 23:23:42,913 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 23:23:54,714 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 23:23:55,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=249880.0, ans=0.0 2024-08-09 23:23:57,578 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.826e+00 2024-08-09 23:23:57,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=249880.0, ans=0.125 2024-08-09 23:23:58,066 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.98 vs. limit=15.0 2024-08-09 23:24:02,728 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10500, loss[loss=0.1233, beats_loss=0.01038, ecapa_loss=0.0003532, whisper_loss=0.1093, over 14316.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01258, ecapa_loss=0.0003188, whisper_loss=0.1022, over 3924888.09 frames. ], batch size: 56, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:24:05,303 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.948e+01 3.458e+01 4.084e+01 6.883e+01, threshold=6.915e+01, percent-clipped=1.0 2024-08-09 23:24:06,864 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-09 23:24:08,975 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.242e-01 2024-08-09 23:24:20,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.83 vs. limit=10.0 2024-08-09 23:24:28,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=250080.0, ans=0.125 2024-08-09 23:24:40,930 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-09 23:24:45,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=250280.0, ans=0.125 2024-08-09 23:24:49,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=250280.0, ans=0.125 2024-08-09 23:24:50,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=250280.0, ans=0.125 2024-08-09 23:25:13,594 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10550, loss[loss=0.0977, beats_loss=0.01447, ecapa_loss=0.0003411, whisper_loss=0.07982, over 19935.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01253, ecapa_loss=0.0003217, whisper_loss=0.1017, over 3894701.07 frames. ], batch size: 84, lr: 2.36e-02, grad_scale: 131072.0 2024-08-09 23:25:19,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=250480.0, ans=0.125 2024-08-09 23:25:44,568 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-09 23:25:51,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=250680.0, ans=0.125 2024-08-09 23:25:52,429 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 23:26:22,703 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10600, loss[loss=0.1136, beats_loss=0.01446, ecapa_loss=0.0002641, whisper_loss=0.09649, over 23461.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01258, ecapa_loss=0.0003206, whisper_loss=0.1009, over 3929633.62 frames. ], batch size: 93, lr: 2.36e-02, grad_scale: 131072.0 2024-08-09 23:26:25,461 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+01 3.120e+01 3.519e+01 3.971e+01 7.530e+01, threshold=7.037e+01, percent-clipped=1.0 2024-08-09 23:26:32,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=250980.0, ans=0.0 2024-08-09 23:26:36,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=251080.0, ans=0.125 2024-08-09 23:26:43,504 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2024-08-09 23:26:53,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=251180.0, ans=0.0 2024-08-09 23:26:58,027 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-09 23:27:00,790 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-09 23:27:22,032 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 20 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-09 23:27:32,187 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10650, loss[loss=0.1183, beats_loss=0.01439, ecapa_loss=0.0003199, whisper_loss=0.1007, over 22072.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01261, ecapa_loss=0.0003191, whisper_loss=0.1015, over 3922788.71 frames. ], batch size: 93, lr: 2.36e-02, grad_scale: 131072.0 2024-08-09 23:27:33,657 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-09 23:27:36,803 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.892e-02 2024-08-09 23:28:04,327 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 28 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-09 23:28:11,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=251680.0, ans=10.0 2024-08-09 23:28:13,029 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.84 vs. limit=10.0 2024-08-09 23:28:20,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=251780.0, ans=0.07 2024-08-09 23:28:41,430 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10700, loss[loss=0.1165, beats_loss=0.01317, ecapa_loss=0.0003039, whisper_loss=0.1003, over 22204.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01261, ecapa_loss=0.0003194, whisper_loss=0.1012, over 3904901.20 frames. ], batch size: 88, lr: 2.36e-02, grad_scale: 131072.0 2024-08-09 23:28:44,311 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.278e+01 2.878e+01 3.295e+01 3.921e+01 5.869e+01, threshold=6.590e+01, percent-clipped=0.0 2024-08-09 23:28:44,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=251980.0, ans=0.125 2024-08-09 23:28:55,550 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-09 23:28:58,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=252080.0, ans=0.125 2024-08-09 23:29:13,205 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=16.89 vs. limit=12.0 2024-08-09 23:29:26,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=252280.0, ans=0.125 2024-08-09 23:29:43,383 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-09 23:29:43,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=252380.0, ans=0.0 2024-08-09 23:29:48,475 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 15 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-09 23:29:51,072 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10750, loss[loss=0.1117, beats_loss=0.01004, ecapa_loss=0.000354, whisper_loss=0.09808, over 19856.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01262, ecapa_loss=0.0003185, whisper_loss=0.1011, over 3896112.88 frames. ], batch size: 79, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:29:58,483 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-09 23:30:01,674 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 23:30:05,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=252580.0, ans=0.1 2024-08-09 23:30:18,479 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-09 23:30:36,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=252780.0, ans=0.125 2024-08-09 23:31:00,293 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10800, loss[loss=0.1238, beats_loss=0.01352, ecapa_loss=0.0003669, whisper_loss=0.1066, over 20185.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01262, ecapa_loss=0.0003175, whisper_loss=0.101, over 3909121.52 frames. ], batch size: 84, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:31:03,108 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 3.032e+01 3.349e+01 3.769e+01 6.080e+01, threshold=6.698e+01, percent-clipped=0.0 2024-08-09 23:31:21,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=253080.0, ans=0.2 2024-08-09 23:31:26,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=253180.0, ans=0.125 2024-08-09 23:31:30,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=253180.0, ans=0.125 2024-08-09 23:31:38,569 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-09 23:31:40,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=253280.0, ans=0.07 2024-08-09 23:31:44,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=253280.0, ans=0.125 2024-08-09 23:31:45,348 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-09 23:31:46,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=253280.0, ans=0.0 2024-08-09 23:31:46,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=253280.0, ans=0.05 2024-08-09 23:31:50,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=253280.0, ans=0.125 2024-08-09 23:32:01,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=253380.0, ans=0.125 2024-08-09 23:32:06,558 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-09 23:32:07,620 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10850, loss[loss=0.1102, beats_loss=0.01672, ecapa_loss=0.0002627, whisper_loss=0.09087, over 19237.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01263, ecapa_loss=0.0003165, whisper_loss=0.1016, over 3914800.24 frames. ], batch size: 79, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:32:17,694 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.16 vs. limit=6.0 2024-08-09 23:32:24,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=253580.0, ans=0.035 2024-08-09 23:32:27,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=253580.0, ans=0.0 2024-08-09 23:32:27,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=253580.0, ans=0.2 2024-08-09 23:32:34,261 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=12.0 2024-08-09 23:32:41,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=253680.0, ans=0.125 2024-08-09 23:32:45,985 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 23:32:48,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=253780.0, ans=0.125 2024-08-09 23:32:54,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=253780.0, ans=0.1 2024-08-09 23:32:58,941 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-09 23:33:14,298 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-09 23:33:15,448 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10900, loss[loss=0.1008, beats_loss=0.01124, ecapa_loss=0.0003405, whisper_loss=0.08617, over 15153.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01259, ecapa_loss=0.000316, whisper_loss=0.1022, over 3925855.40 frames. ], batch size: 62, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:33:18,107 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.391e+01 2.959e+01 3.403e+01 3.969e+01 5.664e+01, threshold=6.807e+01, percent-clipped=0.0 2024-08-09 23:33:21,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=253980.0, ans=0.05 2024-08-09 23:33:26,107 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=36.66 vs. limit=22.5 2024-08-09 23:33:52,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=254180.0, ans=0.1 2024-08-09 23:34:14,072 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.17 vs. limit=12.0 2024-08-09 23:34:20,182 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-09 23:34:21,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=254480.0, ans=0.0 2024-08-09 23:34:22,544 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 10950, loss[loss=0.131, beats_loss=0.0112, ecapa_loss=0.0003439, whisper_loss=0.1164, over 20365.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01255, ecapa_loss=0.0003171, whisper_loss=0.1021, over 3922145.84 frames. ], batch size: 81, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:34:26,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=254480.0, ans=0.125 2024-08-09 23:34:30,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=254480.0, ans=0.0 2024-08-09 23:34:53,347 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=12.0 2024-08-09 23:34:54,246 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-09 23:35:14,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=5.05 vs. limit=15.0 2024-08-09 23:35:16,208 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-09 23:35:28,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=254880.0, ans=0.125 2024-08-09 23:35:30,988 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11000, loss[loss=0.1152, beats_loss=0.009222, ecapa_loss=0.0003728, whisper_loss=0.1022, over 21513.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01251, ecapa_loss=0.0003168, whisper_loss=0.102, over 3938444.66 frames. ], batch size: 89, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:35:33,560 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.280e+01 2.844e+01 3.291e+01 3.745e+01 5.513e+01, threshold=6.582e+01, percent-clipped=0.0 2024-08-09 23:35:46,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=255080.0, ans=0.125 2024-08-09 23:35:59,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=255180.0, ans=0.0 2024-08-09 23:36:27,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=255380.0, ans=0.2 2024-08-09 23:36:30,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=255380.0, ans=0.125 2024-08-09 23:36:41,273 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11050, loss[loss=0.08935, beats_loss=0.01641, ecapa_loss=0.0002027, whisper_loss=0.07092, over 14709.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01246, ecapa_loss=0.0003183, whisper_loss=0.1018, over 3918400.74 frames. ], batch size: 55, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:36:45,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=255480.0, ans=0.125 2024-08-09 23:36:47,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=255480.0, ans=0.0 2024-08-09 23:37:08,159 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=12.0 2024-08-09 23:37:09,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=255680.0, ans=0.1 2024-08-09 23:37:33,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=255780.0, ans=0.0 2024-08-09 23:37:46,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=255880.0, ans=0.0 2024-08-09 23:37:49,839 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.25 vs. limit=15.0 2024-08-09 23:37:50,400 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11100, loss[loss=0.1567, beats_loss=0.009987, ecapa_loss=0.000275, whisper_loss=0.144, over 21288.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01249, ecapa_loss=0.0003181, whisper_loss=0.1017, over 3908956.54 frames. ], batch size: 74, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:37:53,141 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.392e+01 3.083e+01 3.527e+01 4.357e+01 6.576e+01, threshold=7.054e+01, percent-clipped=0.0 2024-08-09 23:37:54,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=255980.0, ans=0.0 2024-08-09 23:38:25,501 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-09 23:38:25,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=256180.0, ans=0.035 2024-08-09 23:38:36,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=256280.0, ans=0.125 2024-08-09 23:38:52,889 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-09 23:38:59,606 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11150, loss[loss=0.1426, beats_loss=0.01233, ecapa_loss=0.0003075, whisper_loss=0.1272, over 23982.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01257, ecapa_loss=0.0003163, whisper_loss=0.101, over 3947433.01 frames. ], batch size: 94, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:39:01,166 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-09 23:39:06,664 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 23:39:06,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=256480.0, ans=0.1 2024-08-09 23:39:13,117 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.11 vs. limit=10.0 2024-08-09 23:39:18,445 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 23:39:18,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=256580.0, ans=0.0 2024-08-09 23:39:33,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=256680.0, ans=0.035 2024-08-09 23:39:35,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=256680.0, ans=0.125 2024-08-09 23:39:58,603 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 23:39:58,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=256880.0, ans=0.125 2024-08-09 23:40:05,000 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=15.0 2024-08-09 23:40:09,588 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11200, loss[loss=0.1253, beats_loss=0.01258, ecapa_loss=0.0002925, whisper_loss=0.1098, over 18039.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01254, ecapa_loss=0.0003157, whisper_loss=0.1012, over 3900939.62 frames. ], batch size: 72, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:40:12,380 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 3.109e+01 3.535e+01 4.149e+01 6.453e+01, threshold=7.070e+01, percent-clipped=0.0 2024-08-09 23:40:13,157 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2024-08-09 23:40:21,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=256980.0, ans=0.0 2024-08-09 23:40:25,821 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-09 23:40:27,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=257080.0, ans=0.125 2024-08-09 23:40:47,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=257180.0, ans=0.125 2024-08-09 23:40:48,484 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.45 vs. limit=15.0 2024-08-09 23:40:50,588 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-09 23:40:53,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=257280.0, ans=0.125 2024-08-09 23:41:16,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=257380.0, ans=0.0 2024-08-09 23:41:19,609 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11250, loss[loss=0.1072, beats_loss=0.01235, ecapa_loss=0.0002912, whisper_loss=0.09191, over 20367.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01251, ecapa_loss=0.000316, whisper_loss=0.102, over 3878010.18 frames. ], batch size: 77, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:41:19,954 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-09 23:41:35,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=257580.0, ans=0.125 2024-08-09 23:41:49,790 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.62 vs. limit=22.5 2024-08-09 23:41:52,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=257680.0, ans=0.0 2024-08-09 23:42:01,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=257780.0, ans=0.125 2024-08-09 23:42:07,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=257780.0, ans=0.5 2024-08-09 23:42:28,236 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11300, loss[loss=0.09465, beats_loss=0.01365, ecapa_loss=0.0003243, whisper_loss=0.07775, over 18683.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01242, ecapa_loss=0.0003156, whisper_loss=0.1021, over 3877595.56 frames. ], batch size: 77, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:42:31,217 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.444e+01 3.110e+01 3.449e+01 4.025e+01 6.550e+01, threshold=6.899e+01, percent-clipped=0.0 2024-08-09 23:42:36,251 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.48 vs. limit=22.5 2024-08-09 23:42:40,593 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-09 23:42:47,343 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 33 from Vox, 38 fro AS 2024-08-09 23:43:11,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=258280.0, ans=0.0 2024-08-09 23:43:20,289 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 23:43:20,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=258280.0, ans=0.2 2024-08-09 23:43:26,662 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-09 23:43:36,194 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11350, loss[loss=0.1133, beats_loss=0.01492, ecapa_loss=0.0003143, whisper_loss=0.09523, over 18093.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01247, ecapa_loss=0.000316, whisper_loss=0.1019, over 3902217.19 frames. ], batch size: 73, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:43:36,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=258480.0, ans=10.0 2024-08-09 23:43:41,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=258480.0, ans=0.125 2024-08-09 23:43:44,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=258480.0, ans=0.1 2024-08-09 23:43:47,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=258480.0, ans=0.0 2024-08-09 23:43:54,605 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 42 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-09 23:44:00,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=258580.0, ans=0.1 2024-08-09 23:44:01,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=258580.0, ans=0.1 2024-08-09 23:44:14,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=258680.0, ans=0.125 2024-08-09 23:44:18,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=258780.0, ans=0.0 2024-08-09 23:44:19,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=258780.0, ans=0.125 2024-08-09 23:44:23,198 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-09 23:44:33,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=258880.0, ans=0.125 2024-08-09 23:44:38,005 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.92 vs. limit=15.0 2024-08-09 23:44:41,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=258880.0, ans=0.125 2024-08-09 23:44:42,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=258880.0, ans=0.2 2024-08-09 23:44:44,752 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11400, loss[loss=0.1101, beats_loss=0.01087, ecapa_loss=0.0003546, whisper_loss=0.09564, over 14498.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01246, ecapa_loss=0.0003153, whisper_loss=0.1019, over 3903960.68 frames. ], batch size: 60, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:44:47,681 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.272e+01 2.889e+01 3.232e+01 3.833e+01 5.860e+01, threshold=6.464e+01, percent-clipped=0.0 2024-08-09 23:45:01,235 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 23:45:02,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=259080.0, ans=0.2 2024-08-09 23:45:02,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.40 vs. limit=22.5 2024-08-09 23:45:12,109 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 23:45:18,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=259180.0, ans=0.125 2024-08-09 23:45:23,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=259180.0, ans=0.2 2024-08-09 23:45:39,413 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 23:45:42,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=259280.0, ans=0.1 2024-08-09 23:45:55,676 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-09 23:45:58,352 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11450, loss[loss=0.1134, beats_loss=0.0118, ecapa_loss=0.0003688, whisper_loss=0.09793, over 22332.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01255, ecapa_loss=0.0003147, whisper_loss=0.1012, over 3899406.52 frames. ], batch size: 89, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:46:00,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=259480.0, ans=0.125 2024-08-09 23:46:11,447 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-09 23:46:23,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=259580.0, ans=0.05 2024-08-09 23:46:37,860 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-09 23:46:54,449 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-09 23:47:08,723 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11500, loss[loss=0.07654, beats_loss=0.01512, ecapa_loss=0.0002315, whisper_loss=0.05911, over 18225.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01251, ecapa_loss=0.0003152, whisper_loss=0.1014, over 3908052.14 frames. ], batch size: 71, lr: 2.32e-02, grad_scale: 131072.0 2024-08-09 23:47:11,356 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 3.028e+01 3.430e+01 4.047e+01 6.324e+01, threshold=6.859e+01, percent-clipped=0.0 2024-08-09 23:47:32,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=260080.0, ans=0.2 2024-08-09 23:47:33,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=260080.0, ans=0.125 2024-08-09 23:47:55,621 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 23:48:06,128 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 20 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 23:48:17,463 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11550, loss[loss=0.1441, beats_loss=0.01018, ecapa_loss=0.000294, whisper_loss=0.1309, over 18301.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01255, ecapa_loss=0.0003159, whisper_loss=0.1011, over 3906347.63 frames. ], batch size: 67, lr: 2.32e-02, grad_scale: 262144.0 2024-08-09 23:48:17,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=260480.0, ans=10.0 2024-08-09 23:48:33,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=260580.0, ans=0.0 2024-08-09 23:48:48,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=260680.0, ans=0.0 2024-08-09 23:48:51,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=260680.0, ans=0.125 2024-08-09 23:49:03,323 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-09 23:49:25,548 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-09 23:49:26,663 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11600, loss[loss=0.1232, beats_loss=0.009534, ecapa_loss=0.0003838, whisper_loss=0.1099, over 17665.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01258, ecapa_loss=0.0003139, whisper_loss=0.1012, over 3894094.90 frames. ], batch size: 72, lr: 2.32e-02, grad_scale: 262144.0 2024-08-09 23:49:29,288 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.873e+01 3.365e+01 3.781e+01 5.038e+01, threshold=6.731e+01, percent-clipped=0.0 2024-08-09 23:49:31,275 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 23:49:46,978 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-09 23:49:54,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=261180.0, ans=0.0 2024-08-09 23:50:04,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=261180.0, ans=0.1 2024-08-09 23:50:05,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=261180.0, ans=0.09899494936611666 2024-08-09 23:50:30,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=261380.0, ans=0.1 2024-08-09 23:50:34,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=261380.0, ans=0.0 2024-08-09 23:50:37,122 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11650, loss[loss=0.1178, beats_loss=0.01459, ecapa_loss=0.0002994, whisper_loss=0.1002, over 23549.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01261, ecapa_loss=0.0003137, whisper_loss=0.1012, over 3882385.61 frames. ], batch size: 93, lr: 2.32e-02, grad_scale: 262144.0 2024-08-09 23:50:39,845 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=12.0 2024-08-09 23:50:44,840 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 23:50:54,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=261580.0, ans=0.1 2024-08-09 23:51:04,463 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-09 23:51:05,226 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.20 vs. limit=10.0 2024-08-09 23:51:17,542 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.25 vs. limit=22.5 2024-08-09 23:51:28,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=261780.0, ans=0.1 2024-08-09 23:51:46,915 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11700, loss[loss=0.1505, beats_loss=0.01089, ecapa_loss=0.0002922, whisper_loss=0.1367, over 23372.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01266, ecapa_loss=0.0003143, whisper_loss=0.1011, over 3881370.20 frames. ], batch size: 86, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:51:48,411 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 23:51:49,562 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 3.059e+01 3.535e+01 4.179e+01 1.066e+02, threshold=7.070e+01, percent-clipped=1.0 2024-08-09 23:51:51,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=261980.0, ans=0.0 2024-08-09 23:51:59,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=262080.0, ans=0.0 2024-08-09 23:52:25,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=262180.0, ans=0.0 2024-08-09 23:52:39,224 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.67 vs. limit=15.0 2024-08-09 23:52:54,444 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11750, loss[loss=0.1211, beats_loss=0.0135, ecapa_loss=0.000281, whisper_loss=0.1048, over 20255.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01269, ecapa_loss=0.000315, whisper_loss=0.1006, over 3899919.96 frames. ], batch size: 80, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:52:54,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=262480.0, ans=0.125 2024-08-09 23:53:21,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=262680.0, ans=0.0 2024-08-09 23:53:28,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=262680.0, ans=0.125 2024-08-09 23:53:35,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=262780.0, ans=0.0 2024-08-09 23:54:02,791 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11800, loss[loss=0.09442, beats_loss=0.01372, ecapa_loss=0.0003231, whisper_loss=0.07747, over 18778.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.0127, ecapa_loss=0.0003127, whisper_loss=0.1003, over 3896578.28 frames. ], batch size: 77, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:54:05,994 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+01 3.014e+01 3.516e+01 4.289e+01 8.691e+01, threshold=7.033e+01, percent-clipped=2.0 2024-08-09 23:54:10,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=262980.0, ans=0.125 2024-08-09 23:54:13,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=262980.0, ans=0.1 2024-08-09 23:54:19,188 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 23:54:20,477 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-09 23:54:26,559 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-09 23:54:46,940 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-09 23:54:51,045 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-09 23:55:02,881 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-09 23:55:11,176 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11850, loss[loss=0.1255, beats_loss=0.01243, ecapa_loss=0.0003218, whisper_loss=0.1098, over 20017.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01267, ecapa_loss=0.0003158, whisper_loss=0.1001, over 3909605.81 frames. ], batch size: 79, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:55:19,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=263480.0, ans=0.125 2024-08-09 23:55:23,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=263580.0, ans=0.125 2024-08-09 23:55:25,620 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 23:55:37,949 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-09 23:55:43,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=263680.0, ans=0.125 2024-08-09 23:55:45,683 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2024-08-09 23:56:16,298 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.574e-01 2024-08-09 23:56:16,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=263880.0, ans=0.0 2024-08-09 23:56:18,356 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11900, loss[loss=0.09748, beats_loss=0.01155, ecapa_loss=0.0004029, whisper_loss=0.08191, over 14133.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01271, ecapa_loss=0.0003155, whisper_loss=0.1003, over 3907773.26 frames. ], batch size: 62, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:56:20,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=263980.0, ans=0.1 2024-08-09 23:56:21,133 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.259e+01 2.968e+01 3.550e+01 4.423e+01 6.843e+01, threshold=7.099e+01, percent-clipped=0.0 2024-08-09 23:56:22,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=263980.0, ans=0.0 2024-08-09 23:56:28,465 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-09 23:56:30,052 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.841e-01 2024-08-09 23:56:38,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=264080.0, ans=0.125 2024-08-09 23:56:42,873 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.21 vs. limit=22.5 2024-08-09 23:56:44,603 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-09 23:56:56,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=264180.0, ans=0.0 2024-08-09 23:57:06,390 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-09 23:57:22,882 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-09 23:57:25,716 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 19 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 23:57:26,843 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 11950, loss[loss=0.09805, beats_loss=0.01414, ecapa_loss=0.0002914, whisper_loss=0.081, over 20337.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01254, ecapa_loss=0.0003179, whisper_loss=0.101, over 3910693.02 frames. ], batch size: 81, lr: 2.30e-02, grad_scale: 262144.0 2024-08-09 23:57:44,026 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 23:57:56,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=264680.0, ans=0.0 2024-08-09 23:57:59,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=264680.0, ans=0.125 2024-08-09 23:58:01,213 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-09 23:58:11,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=264780.0, ans=0.125 2024-08-09 23:58:35,882 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12000, loss[loss=0.1096, beats_loss=0.01433, ecapa_loss=0.0002535, whisper_loss=0.0927, over 22592.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01257, ecapa_loss=0.0003161, whisper_loss=0.1002, over 3893345.56 frames. ], batch size: 87, lr: 2.30e-02, grad_scale: 262144.0 2024-08-09 23:58:35,882 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-09 23:59:15,176 INFO [train_multi_KD3.py:1149] (2/4) Epoch 2, validation on ASR_libri: loss=0.2807, beats_loss=0, ecapa_loss=0.0009345, whisper_loss=0.2713, over 922467.00 frames. 2024-08-09 23:59:32,510 INFO [train_multi_KD3.py:1149] (2/4) Epoch 2, validation on SV_voxceleb1: loss=0.008336, beats_loss=0, ecapa_loss=0.0008336, whisper_loss=0, over 939242.00 frames. 2024-08-10 00:01:27,330 INFO [train_multi_KD3.py:1149] (2/4) Epoch 2, validation on AT_audioset: loss=0.02968, beats_loss=0.02968, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 00:01:27,334 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 00:01:29,813 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.166e+01 2.941e+01 3.442e+01 3.928e+01 6.406e+01, threshold=6.884e+01, percent-clipped=0.0 2024-08-10 00:01:38,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=264980.0, ans=0.0 2024-08-10 00:01:40,135 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 00:01:45,550 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 20 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-10 00:01:49,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=265080.0, ans=0.125 2024-08-10 00:02:05,593 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.12 vs. limit=22.5 2024-08-10 00:02:09,184 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-10 00:02:09,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=265280.0, ans=0.125 2024-08-10 00:02:23,392 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 00:02:23,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=265380.0, ans=0.2 2024-08-10 00:02:24,253 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.40 vs. limit=6.0 2024-08-10 00:02:28,220 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2024-08-10 00:02:33,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=265380.0, ans=0.125 2024-08-10 00:02:37,155 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12050, loss[loss=0.1254, beats_loss=0.01198, ecapa_loss=0.0002956, whisper_loss=0.1105, over 20292.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01265, ecapa_loss=0.0003158, whisper_loss=0.09963, over 3881738.49 frames. ], batch size: 77, lr: 2.30e-02, grad_scale: 262144.0 2024-08-10 00:02:42,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=265480.0, ans=0.0 2024-08-10 00:02:48,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=265480.0, ans=0.09899494936611666 2024-08-10 00:03:00,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=265580.0, ans=0.125 2024-08-10 00:03:13,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=265680.0, ans=0.0 2024-08-10 00:03:34,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2024-08-10 00:03:36,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=265880.0, ans=0.125 2024-08-10 00:03:38,382 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.16 vs. limit=22.5 2024-08-10 00:03:45,638 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.800e+00 2024-08-10 00:03:47,856 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12100, loss[loss=0.1468, beats_loss=0.009858, ecapa_loss=0.0002725, whisper_loss=0.1342, over 23457.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01263, ecapa_loss=0.0003158, whisper_loss=0.09998, over 3895157.22 frames. ], batch size: 85, lr: 2.30e-02, grad_scale: 262144.0 2024-08-10 00:03:50,670 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 3.134e+01 3.753e+01 4.563e+01 7.245e+01, threshold=7.507e+01, percent-clipped=1.0 2024-08-10 00:03:55,103 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 00:03:56,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=265980.0, ans=0.125 2024-08-10 00:04:09,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=266080.0, ans=0.125 2024-08-10 00:04:17,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=266180.0, ans=0.2 2024-08-10 00:04:17,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=266180.0, ans=0.5 2024-08-10 00:04:23,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=266180.0, ans=0.0 2024-08-10 00:04:31,520 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 00:04:34,002 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-10 00:04:38,264 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 00:04:39,825 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-10 00:04:42,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=266380.0, ans=0.04949747468305833 2024-08-10 00:04:44,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=266380.0, ans=0.1 2024-08-10 00:04:54,684 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.07 vs. limit=22.5 2024-08-10 00:04:57,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=266480.0, ans=0.125 2024-08-10 00:04:57,866 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12150, loss[loss=0.1173, beats_loss=0.01341, ecapa_loss=0.0002894, whisper_loss=0.101, over 20950.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01272, ecapa_loss=0.0003136, whisper_loss=0.09958, over 3885033.35 frames. ], batch size: 83, lr: 2.30e-02, grad_scale: 262144.0 2024-08-10 00:05:09,038 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.64 vs. limit=22.5 2024-08-10 00:05:19,145 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-10 00:05:19,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=266580.0, ans=0.1 2024-08-10 00:05:33,413 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 13 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 00:05:34,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=266680.0, ans=0.0 2024-08-10 00:05:37,894 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 00:05:46,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=266780.0, ans=0.09899494936611666 2024-08-10 00:05:59,099 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-10 00:06:03,663 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.56 vs. limit=15.0 2024-08-10 00:06:06,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=266880.0, ans=0.0 2024-08-10 00:06:08,201 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12200, loss[loss=0.1455, beats_loss=0.008369, ecapa_loss=0.0003324, whisper_loss=0.1338, over 22740.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01272, ecapa_loss=0.0003116, whisper_loss=0.09944, over 3850150.61 frames. ], batch size: 88, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:06:11,093 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.872e+01 3.325e+01 3.813e+01 6.794e+01, threshold=6.650e+01, percent-clipped=0.0 2024-08-10 00:06:34,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=267080.0, ans=0.0 2024-08-10 00:06:41,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=267180.0, ans=0.0 2024-08-10 00:06:46,517 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-10 00:06:57,838 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-10 00:06:58,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=267280.0, ans=0.025 2024-08-10 00:07:14,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=267380.0, ans=0.1 2024-08-10 00:07:18,278 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12250, loss[loss=0.09391, beats_loss=0.01373, ecapa_loss=0.0003035, whisper_loss=0.07714, over 20420.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01262, ecapa_loss=0.0003131, whisper_loss=0.09995, over 3845967.97 frames. ], batch size: 83, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:07:22,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=267480.0, ans=0.0 2024-08-10 00:07:26,724 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-10 00:07:28,110 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 00:07:32,989 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.21 vs. limit=22.5 2024-08-10 00:07:55,942 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-10 00:07:58,686 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-10 00:07:59,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=267780.0, ans=0.0 2024-08-10 00:08:13,553 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 00:08:15,602 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.86 vs. limit=22.5 2024-08-10 00:08:20,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=267880.0, ans=0.0 2024-08-10 00:08:23,427 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 00:08:23,928 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.074e-02 2024-08-10 00:08:27,385 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12300, loss[loss=0.1308, beats_loss=0.01296, ecapa_loss=0.0002757, whisper_loss=0.1151, over 22924.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01255, ecapa_loss=0.0003133, whisper_loss=0.1005, over 3869487.84 frames. ], batch size: 87, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:08:30,252 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.986e+01 3.586e+01 4.164e+01 6.809e+01, threshold=7.172e+01, percent-clipped=1.0 2024-08-10 00:08:54,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=268180.0, ans=0.0 2024-08-10 00:09:16,946 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 13 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 00:09:17,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=268280.0, ans=0.1 2024-08-10 00:09:22,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=268380.0, ans=0.125 2024-08-10 00:09:23,993 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 00:09:28,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=268380.0, ans=0.0 2024-08-10 00:09:36,213 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12350, loss[loss=0.09652, beats_loss=0.01329, ecapa_loss=0.0003443, whisper_loss=0.07978, over 13456.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01253, ecapa_loss=0.0003168, whisper_loss=0.1004, over 3848042.32 frames. ], batch size: 56, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:09:36,429 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 00:09:36,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=268480.0, ans=0.0 2024-08-10 00:09:38,609 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=21.54 vs. limit=22.5 2024-08-10 00:10:15,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=268680.0, ans=0.0 2024-08-10 00:10:27,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=268780.0, ans=0.0 2024-08-10 00:10:29,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=268780.0, ans=0.125 2024-08-10 00:10:35,824 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 00:10:48,256 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12400, loss[loss=0.1191, beats_loss=0.01196, ecapa_loss=0.0003282, whisper_loss=0.1038, over 16757.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01244, ecapa_loss=0.0003191, whisper_loss=0.1015, over 3869067.93 frames. ], batch size: 66, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:10:48,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=268980.0, ans=0.125 2024-08-10 00:10:50,962 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 2.997e+01 3.426e+01 4.143e+01 8.992e+01, threshold=6.852e+01, percent-clipped=1.0 2024-08-10 00:10:55,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=268980.0, ans=0.1 2024-08-10 00:10:57,523 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.65 vs. limit=15.0 2024-08-10 00:10:58,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=268980.0, ans=0.0 2024-08-10 00:10:58,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=268980.0, ans=0.125 2024-08-10 00:11:06,650 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2024-08-10 00:11:22,638 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 00:11:23,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=269180.0, ans=0.0 2024-08-10 00:11:38,626 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.62 vs. limit=15.0 2024-08-10 00:11:41,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=269280.0, ans=0.0 2024-08-10 00:11:51,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=269380.0, ans=0.2 2024-08-10 00:11:58,118 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12450, loss[loss=0.1265, beats_loss=0.009546, ecapa_loss=0.0003503, whisper_loss=0.1134, over 15260.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01247, ecapa_loss=0.0003198, whisper_loss=0.1012, over 3859241.39 frames. ], batch size: 58, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:12:24,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=269580.0, ans=10.0 2024-08-10 00:12:24,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=269580.0, ans=0.0 2024-08-10 00:12:28,620 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.11 vs. limit=15.0 2024-08-10 00:12:57,620 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 00:12:59,252 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.733e+04 2024-08-10 00:13:00,422 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-10 00:13:03,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=269880.0, ans=0.125 2024-08-10 00:13:08,254 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12500, loss[loss=0.1287, beats_loss=0.01192, ecapa_loss=0.0003081, whisper_loss=0.1137, over 22224.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01247, ecapa_loss=0.000317, whisper_loss=0.1013, over 3879626.74 frames. ], batch size: 89, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:13:10,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=269980.0, ans=0.1 2024-08-10 00:13:11,259 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 3.015e+01 3.443e+01 4.080e+01 3.263e+02, threshold=6.886e+01, percent-clipped=2.0 2024-08-10 00:13:23,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=270080.0, ans=0.125 2024-08-10 00:13:27,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=270080.0, ans=0.1 2024-08-10 00:13:41,165 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=15.0 2024-08-10 00:13:43,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=270180.0, ans=10.0 2024-08-10 00:13:52,643 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 00:13:58,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=270280.0, ans=0.2 2024-08-10 00:13:58,598 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.51 vs. limit=22.5 2024-08-10 00:13:59,035 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.55 vs. limit=15.0 2024-08-10 00:14:08,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=270380.0, ans=0.0 2024-08-10 00:14:17,278 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12550, loss[loss=0.09821, beats_loss=0.01525, ecapa_loss=0.0002502, whisper_loss=0.08045, over 16473.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01256, ecapa_loss=0.0003153, whisper_loss=0.1011, over 3904523.15 frames. ], batch size: 65, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:14:21,741 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 00:14:29,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=270480.0, ans=0.0 2024-08-10 00:14:40,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=270580.0, ans=0.0 2024-08-10 00:14:42,404 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 25 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-10 00:15:06,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=270780.0, ans=0.0 2024-08-10 00:15:12,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=270880.0, ans=0.125 2024-08-10 00:15:16,339 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 00:15:20,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=270880.0, ans=0.025 2024-08-10 00:15:27,526 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12600, loss[loss=0.1361, beats_loss=0.01085, ecapa_loss=0.0003326, whisper_loss=0.1219, over 21625.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01258, ecapa_loss=0.0003149, whisper_loss=0.1018, over 3907341.65 frames. ], batch size: 89, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:15:28,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=270980.0, ans=0.0 2024-08-10 00:15:30,395 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.374e+01 3.077e+01 3.630e+01 3.984e+01 7.187e+01, threshold=7.260e+01, percent-clipped=1.0 2024-08-10 00:15:42,600 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2024-08-10 00:15:53,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=271080.0, ans=0.125 2024-08-10 00:16:13,016 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 00:16:20,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=271280.0, ans=0.2 2024-08-10 00:16:22,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=271380.0, ans=0.0 2024-08-10 00:16:37,754 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12650, loss[loss=0.1235, beats_loss=0.0114, ecapa_loss=0.0003521, whisper_loss=0.1085, over 14474.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01256, ecapa_loss=0.0003162, whisper_loss=0.1009, over 3869030.70 frames. ], batch size: 58, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:16:38,485 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.95 vs. limit=15.0 2024-08-10 00:16:43,688 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 00:16:44,032 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.175e+03 2024-08-10 00:17:02,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=271580.0, ans=0.0 2024-08-10 00:17:12,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=271680.0, ans=0.05 2024-08-10 00:17:22,785 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-10 00:17:33,842 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 00:17:39,300 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.15 vs. limit=15.0 2024-08-10 00:17:41,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=271880.0, ans=0.125 2024-08-10 00:17:46,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=271980.0, ans=10.0 2024-08-10 00:17:47,616 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12700, loss[loss=0.1311, beats_loss=0.01066, ecapa_loss=0.0003386, whisper_loss=0.117, over 22629.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01249, ecapa_loss=0.0003156, whisper_loss=0.1006, over 3858886.84 frames. ], batch size: 91, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:17:48,466 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2024-08-10 00:17:50,115 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 3.012e+01 3.366e+01 3.844e+01 6.101e+01, threshold=6.733e+01, percent-clipped=0.0 2024-08-10 00:17:57,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=271980.0, ans=0.0 2024-08-10 00:18:00,774 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2024-08-10 00:18:10,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=272080.0, ans=0.125 2024-08-10 00:18:24,682 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2024-08-10 00:18:29,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=272280.0, ans=0.0 2024-08-10 00:18:42,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=272380.0, ans=0.125 2024-08-10 00:18:57,480 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12750, loss[loss=0.1004, beats_loss=0.01364, ecapa_loss=0.0003408, whisper_loss=0.08335, over 21036.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.0125, ecapa_loss=0.0003173, whisper_loss=0.1007, over 3871346.19 frames. ], batch size: 88, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:18:59,899 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.37 vs. limit=15.0 2024-08-10 00:19:05,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=272480.0, ans=0.2 2024-08-10 00:19:06,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=272480.0, ans=0.1 2024-08-10 00:19:49,594 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 00:19:57,963 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 00:20:01,438 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.90 vs. limit=12.0 2024-08-10 00:20:04,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=272880.0, ans=0.0 2024-08-10 00:20:04,312 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.245e+00 2024-08-10 00:20:07,770 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12800, loss[loss=0.08008, beats_loss=0.01397, ecapa_loss=0.0003479, whisper_loss=0.06263, over 14259.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01247, ecapa_loss=0.0003201, whisper_loss=0.1012, over 3893569.22 frames. ], batch size: 61, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:20:10,370 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.474e+01 2.990e+01 3.546e+01 4.142e+01 8.927e+01, threshold=7.091e+01, percent-clipped=1.0 2024-08-10 00:20:31,370 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.52 vs. limit=15.0 2024-08-10 00:20:39,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=273180.0, ans=0.125 2024-08-10 00:20:39,324 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2024-08-10 00:20:40,846 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.39 vs. limit=15.0 2024-08-10 00:20:50,359 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 00:21:02,055 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.78 vs. limit=15.0 2024-08-10 00:21:12,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=273380.0, ans=0.125 2024-08-10 00:21:17,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=273480.0, ans=0.1 2024-08-10 00:21:18,141 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.62 vs. limit=6.0 2024-08-10 00:21:18,437 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12850, loss[loss=0.09847, beats_loss=0.01508, ecapa_loss=0.0002923, whisper_loss=0.08048, over 17055.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.0126, ecapa_loss=0.0003192, whisper_loss=0.09985, over 3881102.96 frames. ], batch size: 69, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:21:21,093 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 00:21:24,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=273480.0, ans=0.125 2024-08-10 00:21:24,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=273480.0, ans=0.04949747468305833 2024-08-10 00:21:39,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=273580.0, ans=0.1 2024-08-10 00:21:42,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=273580.0, ans=0.125 2024-08-10 00:21:45,093 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 37 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 00:21:59,171 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 00:22:06,270 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 27 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 00:22:07,884 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=8.102e-03 2024-08-10 00:22:16,618 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=15.0 2024-08-10 00:22:27,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=273980.0, ans=0.1 2024-08-10 00:22:28,300 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12900, loss[loss=0.1441, beats_loss=0.0112, ecapa_loss=0.0003402, whisper_loss=0.1294, over 18370.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01257, ecapa_loss=0.0003188, whisper_loss=0.09953, over 3879269.79 frames. ], batch size: 72, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:22:31,162 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.442e+01 3.013e+01 3.364e+01 3.931e+01 6.029e+01, threshold=6.729e+01, percent-clipped=0.0 2024-08-10 00:22:40,443 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 00:22:43,374 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 00:23:10,568 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 31 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 00:23:16,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=274280.0, ans=0.125 2024-08-10 00:23:21,065 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2024-08-10 00:23:22,701 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.54 vs. limit=6.0 2024-08-10 00:23:30,706 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 31 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 00:23:32,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.63 vs. limit=22.5 2024-08-10 00:23:35,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=274380.0, ans=0.1 2024-08-10 00:23:40,142 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 12950, loss[loss=0.1233, beats_loss=0.01117, ecapa_loss=0.000357, whisper_loss=0.1086, over 23420.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01243, ecapa_loss=0.0003185, whisper_loss=0.1004, over 3900357.29 frames. ], batch size: 94, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:23:44,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=274480.0, ans=0.125 2024-08-10 00:23:54,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=274580.0, ans=0.125 2024-08-10 00:24:15,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=274680.0, ans=0.1 2024-08-10 00:24:29,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=274780.0, ans=0.2 2024-08-10 00:24:30,831 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 31 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 00:24:38,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=274880.0, ans=0.2 2024-08-10 00:24:43,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=274880.0, ans=0.2 2024-08-10 00:24:50,670 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13000, loss[loss=0.1054, beats_loss=0.01388, ecapa_loss=0.0003072, whisper_loss=0.08845, over 21257.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01249, ecapa_loss=0.0003152, whisper_loss=0.1008, over 3899216.42 frames. ], batch size: 88, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:24:51,443 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.98 vs. limit=22.5 2024-08-10 00:24:53,282 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.907e+01 3.154e+01 3.704e+01 5.779e+01, threshold=6.309e+01, percent-clipped=0.0 2024-08-10 00:24:54,925 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-10 00:25:09,137 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 00:25:19,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=275180.0, ans=0.0 2024-08-10 00:25:28,521 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 00:25:30,034 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 19 from LS+wenet, 27 from Vox, 47 fro AS 2024-08-10 00:25:39,851 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 20 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-10 00:25:40,439 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.12 vs. limit=22.5 2024-08-10 00:25:42,566 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 23 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-10 00:25:47,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.96 vs. limit=12.0 2024-08-10 00:25:56,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=275380.0, ans=0.1 2024-08-10 00:26:01,097 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13050, loss[loss=0.1275, beats_loss=0.01359, ecapa_loss=0.0002643, whisper_loss=0.1112, over 20362.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01258, ecapa_loss=0.0003147, whisper_loss=0.1004, over 3924164.92 frames. ], batch size: 79, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:26:09,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=275480.0, ans=0.125 2024-08-10 00:26:41,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=275680.0, ans=0.0 2024-08-10 00:26:41,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=275680.0, ans=0.125 2024-08-10 00:26:52,700 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.92 vs. limit=15.0 2024-08-10 00:26:54,850 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 00:26:55,360 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.37 vs. limit=12.0 2024-08-10 00:26:59,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=275880.0, ans=0.125 2024-08-10 00:27:07,024 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 00:27:09,705 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 00:27:12,073 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13100, loss[loss=0.1098, beats_loss=0.009529, ecapa_loss=0.0003241, whisper_loss=0.09703, over 15804.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01257, ecapa_loss=0.0003122, whisper_loss=0.1003, over 3874462.60 frames. ], batch size: 59, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:27:14,980 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.977e+01 3.328e+01 3.884e+01 7.929e+01, threshold=6.656e+01, percent-clipped=3.0 2024-08-10 00:27:49,281 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 00:27:49,754 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.55 vs. limit=15.0 2024-08-10 00:27:52,064 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 00:28:02,952 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.09 vs. limit=15.0 2024-08-10 00:28:12,523 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 00:28:23,386 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13150, loss[loss=0.1296, beats_loss=0.01204, ecapa_loss=0.0002935, whisper_loss=0.1147, over 24052.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01259, ecapa_loss=0.0003122, whisper_loss=0.1002, over 3877836.89 frames. ], batch size: 95, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:28:35,371 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.49 vs. limit=15.0 2024-08-10 00:28:36,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=276580.0, ans=0.2 2024-08-10 00:28:54,642 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 12 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 00:29:05,505 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-10 00:29:13,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=276780.0, ans=0.1 2024-08-10 00:29:17,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=276780.0, ans=0.09899494936611666 2024-08-10 00:29:21,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=276880.0, ans=0.0 2024-08-10 00:29:23,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=276880.0, ans=0.0 2024-08-10 00:29:33,300 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13200, loss[loss=0.1036, beats_loss=0.01236, ecapa_loss=0.0002632, whisper_loss=0.08859, over 17728.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01254, ecapa_loss=0.0003134, whisper_loss=0.1003, over 3875900.68 frames. ], batch size: 68, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:29:34,918 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 12 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 00:29:36,041 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.260e+01 3.048e+01 3.557e+01 4.616e+01 6.724e+01, threshold=7.115e+01, percent-clipped=1.0 2024-08-10 00:29:39,249 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 00:29:57,937 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-10 00:29:58,747 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.50 vs. limit=15.0 2024-08-10 00:30:04,881 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 23 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-10 00:30:17,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=277280.0, ans=0.125 2024-08-10 00:30:19,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=277280.0, ans=0.0 2024-08-10 00:30:28,125 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 00:30:32,353 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 00:30:43,176 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13250, loss[loss=0.109, beats_loss=0.01152, ecapa_loss=0.0002841, whisper_loss=0.09467, over 18454.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01252, ecapa_loss=0.0003118, whisper_loss=0.1008, over 3868579.55 frames. ], batch size: 71, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:30:53,054 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 00:31:05,993 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 00:31:10,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=277680.0, ans=0.125 2024-08-10 00:31:17,438 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.54 vs. limit=15.0 2024-08-10 00:31:19,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=277680.0, ans=0.125 2024-08-10 00:31:21,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=277680.0, ans=0.0 2024-08-10 00:31:21,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277680.0, ans=0.1 2024-08-10 00:31:43,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=277880.0, ans=0.125 2024-08-10 00:31:56,261 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13300, loss[loss=0.1019, beats_loss=0.01648, ecapa_loss=0.0002352, whisper_loss=0.0831, over 20208.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01256, ecapa_loss=0.0003096, whisper_loss=0.1004, over 3883203.23 frames. ], batch size: 82, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:31:57,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=277980.0, ans=0.125 2024-08-10 00:31:59,759 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.409e+01 2.953e+01 3.236e+01 3.823e+01 6.068e+01, threshold=6.472e+01, percent-clipped=0.0 2024-08-10 00:32:10,854 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.84 vs. limit=15.0 2024-08-10 00:32:17,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.54 vs. limit=15.0 2024-08-10 00:32:17,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=278080.0, ans=0.125 2024-08-10 00:32:25,130 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-10 00:32:34,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=278180.0, ans=0.125 2024-08-10 00:32:42,681 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-10 00:32:50,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=278280.0, ans=0.07 2024-08-10 00:33:03,859 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 00:33:05,212 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 00:33:14,285 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13350, loss[loss=0.1461, beats_loss=0.01002, ecapa_loss=0.0003239, whisper_loss=0.1328, over 21972.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01256, ecapa_loss=0.000311, whisper_loss=0.1006, over 3892347.21 frames. ], batch size: 81, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:33:20,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=278480.0, ans=0.5 2024-08-10 00:33:34,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=278580.0, ans=0.125 2024-08-10 00:33:35,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=278580.0, ans=0.0 2024-08-10 00:33:37,189 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 24 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-10 00:33:42,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=278580.0, ans=0.0 2024-08-10 00:33:56,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=278680.0, ans=0.2 2024-08-10 00:34:24,250 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-10 00:34:31,838 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13400, loss[loss=0.1141, beats_loss=0.01052, ecapa_loss=0.0003825, whisper_loss=0.09977, over 20232.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01245, ecapa_loss=0.0003136, whisper_loss=0.1014, over 3885582.26 frames. ], batch size: 86, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:34:34,747 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.868e+01 3.242e+01 3.595e+01 7.666e+01, threshold=6.483e+01, percent-clipped=2.0 2024-08-10 00:34:41,539 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.17 vs. limit=15.0 2024-08-10 00:34:57,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=279080.0, ans=0.2 2024-08-10 00:35:06,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=279180.0, ans=0.0 2024-08-10 00:35:09,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=279180.0, ans=0.125 2024-08-10 00:35:11,455 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=15.0 2024-08-10 00:35:12,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=279180.0, ans=0.0 2024-08-10 00:35:22,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=279280.0, ans=0.125 2024-08-10 00:35:41,211 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-10 00:35:48,268 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13450, loss[loss=0.1305, beats_loss=0.01136, ecapa_loss=0.0002673, whisper_loss=0.1165, over 23116.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01253, ecapa_loss=0.0003149, whisper_loss=0.1006, over 3892980.65 frames. ], batch size: 89, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:35:56,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=279480.0, ans=0.125 2024-08-10 00:36:33,894 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 00:36:39,599 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 12 from Vox, 46 fro AS 2024-08-10 00:36:50,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=279880.0, ans=0.125 2024-08-10 00:36:53,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=279880.0, ans=0.1 2024-08-10 00:37:04,119 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-10 00:37:06,885 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13500, loss[loss=0.0781, beats_loss=0.01525, ecapa_loss=0.000377, whisper_loss=0.05908, over 12310.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.0126, ecapa_loss=0.0003141, whisper_loss=0.1005, over 3880499.05 frames. ], batch size: 54, lr: 2.24e-02, grad_scale: 262144.0 2024-08-10 00:37:13,007 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 3.053e+01 3.516e+01 4.040e+01 7.643e+01, threshold=7.031e+01, percent-clipped=3.0 2024-08-10 00:37:20,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=279980.0, ans=0.1 2024-08-10 00:37:41,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=280180.0, ans=0.125 2024-08-10 00:37:58,384 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2024-08-10 00:38:01,742 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 00:38:02,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=280280.0, ans=0.0 2024-08-10 00:38:17,425 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.53 vs. limit=22.5 2024-08-10 00:38:22,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=280380.0, ans=0.125 2024-08-10 00:38:24,599 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13550, loss[loss=0.1106, beats_loss=0.01265, ecapa_loss=0.0003253, whisper_loss=0.09475, over 19887.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01266, ecapa_loss=0.0003117, whisper_loss=0.1001, over 3872682.18 frames. ], batch size: 80, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:38:49,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=280580.0, ans=0.0 2024-08-10 00:38:56,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=280680.0, ans=0.07 2024-08-10 00:39:04,017 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.04 vs. limit=22.5 2024-08-10 00:39:07,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=280680.0, ans=0.125 2024-08-10 00:39:18,730 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 00:39:24,829 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 00:39:32,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.69 vs. limit=15.0 2024-08-10 00:39:41,688 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13600, loss[loss=0.134, beats_loss=0.009947, ecapa_loss=0.0003117, whisper_loss=0.121, over 24272.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01261, ecapa_loss=0.0003118, whisper_loss=0.1001, over 3873036.05 frames. ], batch size: 92, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:39:44,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=280980.0, ans=0.0 2024-08-10 00:39:44,913 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.214e+01 2.967e+01 3.461e+01 3.946e+01 7.975e+01, threshold=6.923e+01, percent-clipped=1.0 2024-08-10 00:39:45,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=280980.0, ans=0.1 2024-08-10 00:39:50,348 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 00:40:02,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=281080.0, ans=0.0 2024-08-10 00:40:17,931 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 00:40:25,093 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2024-08-10 00:40:36,685 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.05 vs. limit=15.0 2024-08-10 00:40:47,268 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 00:40:49,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=281380.0, ans=0.0 2024-08-10 00:40:55,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=281380.0, ans=0.2 2024-08-10 00:40:58,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=281380.0, ans=0.125 2024-08-10 00:41:00,930 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13650, loss[loss=0.1239, beats_loss=0.01137, ecapa_loss=0.0003636, whisper_loss=0.1089, over 19919.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01265, ecapa_loss=0.0003129, whisper_loss=0.09981, over 3895027.78 frames. ], batch size: 82, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:41:01,924 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 00:41:54,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=281780.0, ans=0.0 2024-08-10 00:42:04,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=281780.0, ans=0.125 2024-08-10 00:42:05,031 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 13 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 00:42:16,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=281880.0, ans=0.125 2024-08-10 00:42:16,966 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.31 vs. limit=15.0 2024-08-10 00:42:18,450 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=18.37 vs. limit=15.0 2024-08-10 00:42:22,445 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13700, loss[loss=0.1377, beats_loss=0.01082, ecapa_loss=0.0003188, whisper_loss=0.1237, over 19383.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01268, ecapa_loss=0.0003125, whisper_loss=0.09998, over 3919729.11 frames. ], batch size: 76, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:42:23,502 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.52 vs. limit=10.0 2024-08-10 00:42:25,240 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+01 2.951e+01 3.261e+01 3.919e+01 6.807e+01, threshold=6.522e+01, percent-clipped=0.0 2024-08-10 00:42:25,918 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 00:42:28,000 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.80 vs. limit=15.0 2024-08-10 00:42:45,615 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.56 vs. limit=22.5 2024-08-10 00:42:58,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=282180.0, ans=0.125 2024-08-10 00:43:04,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=282180.0, ans=0.0 2024-08-10 00:43:11,639 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 27 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-10 00:43:26,224 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2024-08-10 00:43:27,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=282380.0, ans=0.0 2024-08-10 00:43:44,155 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13750, loss[loss=0.1333, beats_loss=0.01378, ecapa_loss=0.0002788, whisper_loss=0.1167, over 21170.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01261, ecapa_loss=0.000314, whisper_loss=0.1003, over 3904363.79 frames. ], batch size: 84, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:44:18,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=282680.0, ans=0.2 2024-08-10 00:44:35,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=282780.0, ans=0.1 2024-08-10 00:44:47,527 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.90 vs. limit=22.5 2024-08-10 00:44:53,884 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=12.0 2024-08-10 00:45:02,120 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13800, loss[loss=0.1147, beats_loss=0.01044, ecapa_loss=0.0004103, whisper_loss=0.1001, over 18996.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.0126, ecapa_loss=0.0003124, whisper_loss=0.1, over 3886764.88 frames. ], batch size: 79, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:45:06,398 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.944e+01 3.294e+01 3.829e+01 5.391e+01, threshold=6.589e+01, percent-clipped=0.0 2024-08-10 00:45:26,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=283080.0, ans=0.125 2024-08-10 00:45:36,860 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 34 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-10 00:46:02,812 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-10 00:46:03,793 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.542e+00 2024-08-10 00:46:25,691 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13850, loss[loss=0.1137, beats_loss=0.01256, ecapa_loss=0.000377, whisper_loss=0.09737, over 20870.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01252, ecapa_loss=0.0003112, whisper_loss=0.1002, over 3891980.85 frames. ], batch size: 89, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:46:26,196 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 39 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-10 00:46:28,751 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2024-08-10 00:46:31,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=283480.0, ans=0.125 2024-08-10 00:46:31,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=283480.0, ans=0.125 2024-08-10 00:46:37,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=283480.0, ans=0.125 2024-08-10 00:46:43,846 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 00:46:58,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=283680.0, ans=0.0 2024-08-10 00:47:02,321 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.45 vs. limit=22.5 2024-08-10 00:47:03,019 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 00:47:03,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=283680.0, ans=0.125 2024-08-10 00:47:47,247 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13900, loss[loss=0.1051, beats_loss=0.01457, ecapa_loss=0.0002896, whisper_loss=0.08767, over 18986.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01248, ecapa_loss=0.0003082, whisper_loss=0.101, over 3934046.47 frames. ], batch size: 76, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:47:50,860 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.945e+01 3.348e+01 3.878e+01 5.863e+01, threshold=6.696e+01, percent-clipped=0.0 2024-08-10 00:48:07,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=284080.0, ans=0.125 2024-08-10 00:48:13,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=284080.0, ans=0.125 2024-08-10 00:48:23,574 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 8 from Vox, 32 fro AS 2024-08-10 00:48:38,816 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.68 vs. limit=15.0 2024-08-10 00:48:45,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=284280.0, ans=0.1 2024-08-10 00:48:57,473 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.29 vs. limit=10.0 2024-08-10 00:49:05,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=284380.0, ans=0.1 2024-08-10 00:49:09,822 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 13950, loss[loss=0.1236, beats_loss=0.01429, ecapa_loss=0.0002144, whisper_loss=0.1072, over 18743.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01242, ecapa_loss=0.0003088, whisper_loss=0.1018, over 3949722.53 frames. ], batch size: 69, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:50:03,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=284780.0, ans=0.0 2024-08-10 00:50:13,723 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 00:50:18,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=284880.0, ans=0.2 2024-08-10 00:50:23,242 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=15.0 2024-08-10 00:50:33,117 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 14000, loss[loss=0.1104, beats_loss=0.01446, ecapa_loss=0.0002295, whisper_loss=0.09369, over 20744.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01246, ecapa_loss=0.0003067, whisper_loss=0.1017, over 3917551.54 frames. ], batch size: 80, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:50:35,933 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.957e+01 3.357e+01 3.952e+01 6.248e+01, threshold=6.715e+01, percent-clipped=0.0 2024-08-10 00:50:40,329 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=12.0 2024-08-10 00:50:43,044 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 00:50:53,810 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.08 vs. limit=6.0 2024-08-10 00:50:56,140 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 00:50:57,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=285080.0, ans=0.0 2024-08-10 00:51:03,431 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.38 vs. limit=22.5 2024-08-10 00:51:07,702 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 00:51:08,947 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 00:51:18,346 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2024-08-10 00:51:28,738 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.60 vs. limit=10.0 2024-08-10 00:51:54,289 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 14050, loss[loss=0.09426, beats_loss=0.01878, ecapa_loss=0.0002054, whisper_loss=0.07342, over 13912.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01254, ecapa_loss=0.0003046, whisper_loss=0.1006, over 3878571.09 frames. ], batch size: 54, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:51:57,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=285480.0, ans=0.125 2024-08-10 00:52:03,205 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 00:52:04,366 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.49 vs. limit=15.0 2024-08-10 00:52:23,893 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 27 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 00:52:33,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=285680.0, ans=0.125 2024-08-10 00:52:42,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.47 vs. limit=10.0 2024-08-10 00:52:48,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=285780.0, ans=0.1 2024-08-10 00:52:50,025 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 00:53:02,521 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 00:53:14,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=285980.0, ans=0.2 2024-08-10 00:53:15,401 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 14100, loss[loss=0.1001, beats_loss=0.01744, ecapa_loss=0.000226, whisper_loss=0.08039, over 21632.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01259, ecapa_loss=0.000304, whisper_loss=0.1007, over 3879971.82 frames. ], batch size: 86, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:53:18,615 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.343e+01 2.998e+01 3.654e+01 4.043e+01 1.341e+02, threshold=7.307e+01, percent-clipped=1.0 2024-08-10 00:53:19,035 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 00:53:25,448 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-10 00:53:32,468 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 00:53:46,227 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=18.90 vs. limit=15.0 2024-08-10 00:54:35,515 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 14150, loss[loss=0.1488, beats_loss=0.01212, ecapa_loss=0.0003291, whisper_loss=0.1334, over 21898.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01254, ecapa_loss=0.0003073, whisper_loss=0.1015, over 3872169.05 frames. ], batch size: 89, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:54:40,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=286480.0, ans=0.09899494936611666 2024-08-10 00:55:06,997 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 00:55:17,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=286680.0, ans=0.0 2024-08-10 00:55:18,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=286680.0, ans=0.125 2024-08-10 00:55:32,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=286780.0, ans=0.125 2024-08-10 00:55:45,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=286880.0, ans=0.125 2024-08-10 00:55:46,236 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 00:55:53,351 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 14200, loss[loss=0.1112, beats_loss=0.01166, ecapa_loss=0.0002649, whisper_loss=0.09689, over 21382.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01262, ecapa_loss=0.0003066, whisper_loss=0.1009, over 3897670.28 frames. ], batch size: 83, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:55:58,023 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+01 3.000e+01 3.388e+01 3.894e+01 5.742e+01, threshold=6.776e+01, percent-clipped=0.0 2024-08-10 00:56:12,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=287080.0, ans=0.035 2024-08-10 00:56:30,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=287080.0, ans=0.0 2024-08-10 00:56:35,599 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 36 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 00:57:01,406 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 00:57:38,514 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 14250, loss[loss=0.1169, beats_loss=0.009419, ecapa_loss=0.0003084, whisper_loss=0.1044, over 19492.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01262, ecapa_loss=0.0003074, whisper_loss=0.1005, over 3898022.65 frames. ], batch size: 74, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:57:48,525 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 00:58:05,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=287580.0, ans=0.1 2024-08-10 00:58:12,280 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 00:58:20,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=287680.0, ans=0.125 2024-08-10 00:59:09,717 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 28 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 00:59:14,149 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 14300, loss[loss=0.1063, beats_loss=0.01588, ecapa_loss=0.000281, whisper_loss=0.08762, over 22322.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01262, ecapa_loss=0.0003081, whisper_loss=0.1005, over 3904817.48 frames. ], batch size: 90, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:59:19,310 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.453e+01 3.147e+01 3.620e+01 4.133e+01 1.421e+02, threshold=7.240e+01, percent-clipped=1.0 2024-08-10 00:59:36,719 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 00:59:50,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=288080.0, ans=0.0 2024-08-10 01:00:33,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.34 vs. limit=10.0 2024-08-10 01:00:48,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=288380.0, ans=0.5 2024-08-10 01:00:51,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=288380.0, ans=0.0 2024-08-10 01:01:00,873 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 01:01:03,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=288380.0, ans=0.125 2024-08-10 01:01:12,141 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 14350, loss[loss=0.1146, beats_loss=0.01429, ecapa_loss=0.0003155, whisper_loss=0.0972, over 23025.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01271, ecapa_loss=0.0003081, whisper_loss=0.0997, over 3876330.70 frames. ], batch size: 93, lr: 2.21e-02, grad_scale: 524288.0 2024-08-10 01:01:40,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=288580.0, ans=0.2 2024-08-10 01:01:57,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=288680.0, ans=0.1 2024-08-10 01:02:15,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=288680.0, ans=0.2 2024-08-10 01:02:24,292 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.52 vs. limit=10.0 2024-08-10 01:03:04,506 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 01:03:08,882 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 14400, loss[loss=0.1367, beats_loss=0.01081, ecapa_loss=0.0002693, whisper_loss=0.1232, over 24004.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01263, ecapa_loss=0.0003082, whisper_loss=0.1002, over 3898610.84 frames. ], batch size: 89, lr: 2.21e-02, grad_scale: 524288.0 2024-08-10 01:03:13,729 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.997e+01 3.365e+01 3.798e+01 7.821e+01, threshold=6.729e+01, percent-clipped=1.0 2024-08-10 01:03:19,733 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 01:03:22,241 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-08-10 01:03:40,137 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 01:03:41,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=289080.0, ans=0.0 2024-08-10 01:04:14,980 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 01:04:24,293 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 19 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-10 01:04:42,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=289380.0, ans=0.125 2024-08-10 01:04:45,092 INFO [train_multi_KD3.py:1116] (2/4) Epoch 2, batch 14450, loss[loss=0.09442, beats_loss=0.01455, ecapa_loss=0.0003877, whisper_loss=0.07599, over 19288.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01265, ecapa_loss=0.0003073, whisper_loss=0.09991, over 3885967.52 frames. ], batch size: 81, lr: 2.21e-02, grad_scale: 524288.0 2024-08-10 01:04:48,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=289480.0, ans=0.125 2024-08-10 01:04:52,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=289480.0, ans=0.125 2024-08-10 01:04:58,337 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2024-08-10 01:05:25,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=289680.0, ans=0.0 2024-08-10 01:05:34,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=289780.0, ans=0.5 2024-08-10 01:05:36,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=289780.0, ans=0.0 2024-08-10 01:05:37,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=289780.0, ans=0.035 2024-08-10 01:06:23,831 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 0, loss[loss=0.1293, beats_loss=0.01139, ecapa_loss=0.0003198, whisper_loss=0.1147, over 16763.00 frames. ], tot_loss[loss=0.1293, beats_loss=0.01139, ecapa_loss=0.0003198, whisper_loss=0.1147, over 16763.00 frames. ], batch size: 64, lr: 2.10e-02, grad_scale: 524288.0 2024-08-10 01:06:23,831 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-10 01:07:07,530 INFO [train_multi_KD3.py:1149] (2/4) Epoch 3, validation on ASR_libri: loss=0.2782, beats_loss=0, ecapa_loss=0.0009143, whisper_loss=0.2691, over 922467.00 frames. 2024-08-10 01:07:23,564 INFO [train_multi_KD3.py:1149] (2/4) Epoch 3, validation on SV_voxceleb1: loss=0.008083, beats_loss=0, ecapa_loss=0.0008083, whisper_loss=0, over 939242.00 frames. 2024-08-10 01:09:28,054 INFO [train_multi_KD3.py:1149] (2/4) Epoch 3, validation on AT_audioset: loss=0.02889, beats_loss=0.02889, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 01:09:28,057 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 01:09:29,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=289880.0, ans=0.125 2024-08-10 01:09:39,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=289880.0, ans=0.2 2024-08-10 01:09:46,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=289880.0, ans=0.0 2024-08-10 01:10:02,944 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+01 3.015e+01 3.420e+01 3.932e+01 5.377e+01, threshold=6.841e+01, percent-clipped=0.0 2024-08-10 01:10:08,284 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2024-08-10 01:10:31,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=290080.0, ans=0.125 2024-08-10 01:11:42,117 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 50, loss[loss=0.1118, beats_loss=0.01189, ecapa_loss=0.0003369, whisper_loss=0.09656, over 23501.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01247, ecapa_loss=0.000312, whisper_loss=0.1007, over 901664.61 frames. ], batch size: 92, lr: 2.10e-02, grad_scale: 524288.0 2024-08-10 01:11:46,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=290380.0, ans=12.0 2024-08-10 01:11:59,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=290380.0, ans=0.125 2024-08-10 01:12:01,270 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 01:12:42,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=290580.0, ans=0.0 2024-08-10 01:12:43,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=290580.0, ans=0.0 2024-08-10 01:13:15,482 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.838e+00 2024-08-10 01:13:20,105 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 01:13:20,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=290780.0, ans=0.125 2024-08-10 01:13:25,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=290780.0, ans=0.125 2024-08-10 01:13:26,201 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.82 vs. limit=22.5 2024-08-10 01:13:37,131 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 01:13:46,036 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.51 vs. limit=15.0 2024-08-10 01:13:48,623 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 100, loss[loss=0.1323, beats_loss=0.01326, ecapa_loss=0.0002917, whisper_loss=0.1162, over 22229.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01248, ecapa_loss=0.0003087, whisper_loss=0.09931, over 1557202.40 frames. ], batch size: 91, lr: 2.10e-02, grad_scale: 524288.0 2024-08-10 01:13:55,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=290880.0, ans=0.1 2024-08-10 01:14:03,328 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-10 01:14:17,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=290980.0, ans=0.05 2024-08-10 01:14:18,744 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.624e+01 3.304e+01 3.835e+01 4.447e+01 6.801e+01, threshold=7.671e+01, percent-clipped=0.0 2024-08-10 01:14:20,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=290980.0, ans=0.0 2024-08-10 01:14:24,327 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-10 01:15:04,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=291180.0, ans=0.125 2024-08-10 01:15:28,183 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 01:15:41,693 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 01:15:44,450 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 150, loss[loss=0.09998, beats_loss=0.01332, ecapa_loss=0.0002583, whisper_loss=0.08408, over 15746.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01228, ecapa_loss=0.000306, whisper_loss=0.1002, over 2040823.61 frames. ], batch size: 65, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:15:53,308 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 36 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 01:15:58,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=291380.0, ans=0.2 2024-08-10 01:16:06,757 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 01:16:25,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=291580.0, ans=0.0 2024-08-10 01:16:46,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=291680.0, ans=0.125 2024-08-10 01:16:47,108 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=16.04 vs. limit=15.0 2024-08-10 01:17:11,591 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 200, loss[loss=0.1188, beats_loss=0.0114, ecapa_loss=0.0003348, whisper_loss=0.104, over 22521.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01215, ecapa_loss=0.0003045, whisper_loss=0.1007, over 2430811.23 frames. ], batch size: 90, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:17:28,679 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 01:17:31,882 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 3.029e+01 3.361e+01 3.912e+01 9.673e+01, threshold=6.721e+01, percent-clipped=1.0 2024-08-10 01:17:35,856 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2024-08-10 01:17:44,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=292080.0, ans=0.0 2024-08-10 01:17:53,695 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.90 vs. limit=12.0 2024-08-10 01:17:56,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=292080.0, ans=0.0 2024-08-10 01:17:56,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=292080.0, ans=0.025 2024-08-10 01:18:18,637 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.67 vs. limit=10.0 2024-08-10 01:18:22,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=292280.0, ans=0.125 2024-08-10 01:18:28,980 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 01:18:31,464 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 250, loss[loss=0.1348, beats_loss=0.01242, ecapa_loss=0.00026, whisper_loss=0.1198, over 20344.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01224, ecapa_loss=0.0002976, whisper_loss=0.1021, over 2770568.33 frames. ], batch size: 77, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:18:56,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=292480.0, ans=0.0 2024-08-10 01:18:56,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=292480.0, ans=0.0 2024-08-10 01:19:07,978 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 13 from LS+wenet, 28 from Vox, 46 fro AS 2024-08-10 01:19:17,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=292680.0, ans=0.125 2024-08-10 01:19:46,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=292880.0, ans=0.2 2024-08-10 01:19:47,728 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 300, loss[loss=0.1179, beats_loss=0.01074, ecapa_loss=0.0003432, whisper_loss=0.1037, over 14022.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01212, ecapa_loss=0.0002984, whisper_loss=0.1004, over 2989625.96 frames. ], batch size: 58, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:19:48,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=292880.0, ans=0.0 2024-08-10 01:19:57,298 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 01:20:01,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=292880.0, ans=0.0 2024-08-10 01:20:06,055 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-10 01:20:06,474 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 3.157e+01 3.521e+01 4.168e+01 6.266e+01, threshold=7.043e+01, percent-clipped=0.0 2024-08-10 01:20:11,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=292980.0, ans=0.125 2024-08-10 01:20:34,114 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 01:20:48,857 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-10 01:21:02,046 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 33 from Vox, 38 fro AS 2024-08-10 01:21:06,590 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 350, loss[loss=0.1352, beats_loss=0.01097, ecapa_loss=0.0003274, whisper_loss=0.1209, over 21613.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.0121, ecapa_loss=0.0002948, whisper_loss=0.1008, over 3190125.89 frames. ], batch size: 86, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:21:20,321 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-10 01:21:31,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=293480.0, ans=0.125 2024-08-10 01:21:37,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=293580.0, ans=0.2 2024-08-10 01:21:38,663 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 01:21:48,673 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 01:21:49,120 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 01:21:50,175 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 6 from Vox, 29 fro AS 2024-08-10 01:22:01,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=293680.0, ans=0.95 2024-08-10 01:22:08,429 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 01:22:10,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=293780.0, ans=0.0 2024-08-10 01:22:15,700 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 01:22:20,762 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 01:22:21,719 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 400, loss[loss=0.1149, beats_loss=0.01293, ecapa_loss=0.0003455, whisper_loss=0.09848, over 21970.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01199, ecapa_loss=0.0002946, whisper_loss=0.1014, over 3321798.12 frames. ], batch size: 92, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:22:39,678 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 2.898e+01 3.177e+01 4.000e+01 8.293e+01, threshold=6.353e+01, percent-clipped=1.0 2024-08-10 01:22:41,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=293980.0, ans=0.0 2024-08-10 01:22:41,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=293980.0, ans=0.0 2024-08-10 01:22:44,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=293980.0, ans=0.125 2024-08-10 01:22:44,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=293980.0, ans=0.05 2024-08-10 01:22:47,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.10 vs. limit=22.5 2024-08-10 01:22:55,527 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.86 vs. limit=22.5 2024-08-10 01:22:57,844 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-10 01:22:59,882 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.260e+00 2024-08-10 01:23:05,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=294180.0, ans=0.125 2024-08-10 01:23:07,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.88 vs. limit=15.0 2024-08-10 01:23:13,086 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-10 01:23:22,834 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=12.0 2024-08-10 01:23:33,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=294280.0, ans=0.0 2024-08-10 01:23:33,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=294280.0, ans=0.125 2024-08-10 01:23:36,129 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 01:23:37,223 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 450, loss[loss=0.1072, beats_loss=0.0118, ecapa_loss=0.0002633, whisper_loss=0.09274, over 16919.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.0122, ecapa_loss=0.0002899, whisper_loss=0.09979, over 3447552.28 frames. ], batch size: 64, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:23:44,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=294380.0, ans=0.2 2024-08-10 01:23:47,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=294380.0, ans=0.1 2024-08-10 01:23:53,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=294480.0, ans=0.125 2024-08-10 01:24:07,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=294580.0, ans=0.0 2024-08-10 01:24:11,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=294580.0, ans=0.125 2024-08-10 01:24:41,168 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 01:24:42,479 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 01:24:43,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=294780.0, ans=0.125 2024-08-10 01:24:46,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=294780.0, ans=0.0 2024-08-10 01:24:48,756 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-10 01:24:52,134 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 500, loss[loss=0.1138, beats_loss=0.01194, ecapa_loss=0.0002655, whisper_loss=0.0992, over 19942.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01224, ecapa_loss=0.0002893, whisper_loss=0.09945, over 3535388.19 frames. ], batch size: 76, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:25:04,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=294880.0, ans=0.2 2024-08-10 01:25:09,598 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.130e+01 2.966e+01 3.370e+01 3.826e+01 6.580e+01, threshold=6.739e+01, percent-clipped=1.0 2024-08-10 01:25:25,163 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.28 vs. limit=15.0 2024-08-10 01:25:30,670 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 30 from LS+wenet, 9 from Vox, 32 fro AS 2024-08-10 01:25:34,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=295180.0, ans=0.0 2024-08-10 01:25:48,091 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-10 01:25:55,135 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.18 vs. limit=6.0 2024-08-10 01:25:57,024 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 01:26:00,975 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 01:26:05,239 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 550, loss[loss=0.1352, beats_loss=0.009996, ecapa_loss=0.0003203, whisper_loss=0.122, over 19626.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01234, ecapa_loss=0.0002894, whisper_loss=0.09898, over 3606512.01 frames. ], batch size: 77, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:26:10,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.61 vs. limit=15.0 2024-08-10 01:26:20,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=295480.0, ans=0.015 2024-08-10 01:27:18,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=295780.0, ans=0.0 2024-08-10 01:27:20,964 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 600, loss[loss=0.1005, beats_loss=0.01259, ecapa_loss=0.0003164, whisper_loss=0.08478, over 21529.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01235, ecapa_loss=0.0002893, whisper_loss=0.09893, over 3641107.03 frames. ], batch size: 90, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:27:21,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=295880.0, ans=0.0 2024-08-10 01:27:32,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=295880.0, ans=0.2 2024-08-10 01:27:32,500 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.07 vs. limit=15.0 2024-08-10 01:27:33,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=295880.0, ans=0.125 2024-08-10 01:27:36,723 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-08-10 01:27:38,356 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.875e+01 3.342e+01 3.961e+01 6.306e+01, threshold=6.685e+01, percent-clipped=0.0 2024-08-10 01:27:49,465 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 01:28:03,322 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 01:28:10,519 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 9 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 01:28:12,120 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 01:28:16,406 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 01:28:24,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=296280.0, ans=0.125 2024-08-10 01:28:29,708 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 01:28:36,080 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 650, loss[loss=0.1185, beats_loss=0.0133, ecapa_loss=0.0002301, whisper_loss=0.1029, over 22006.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01226, ecapa_loss=0.0002895, whisper_loss=0.09953, over 3699662.05 frames. ], batch size: 81, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:28:39,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=296380.0, ans=0.2 2024-08-10 01:28:42,080 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 01:28:55,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=296480.0, ans=0.09899494936611666 2024-08-10 01:29:15,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=296580.0, ans=0.125 2024-08-10 01:29:21,384 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 01:29:21,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296680.0, ans=0.1 2024-08-10 01:29:24,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=296680.0, ans=0.2 2024-08-10 01:29:29,556 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.84 vs. limit=15.0 2024-08-10 01:29:33,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=296780.0, ans=0.125 2024-08-10 01:29:46,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=296780.0, ans=0.0 2024-08-10 01:29:48,946 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 700, loss[loss=0.1262, beats_loss=0.01132, ecapa_loss=0.0003116, whisper_loss=0.1117, over 16368.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01232, ecapa_loss=0.0002882, whisper_loss=0.09834, over 3697572.21 frames. ], batch size: 63, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:29:54,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=296880.0, ans=0.125 2024-08-10 01:29:58,240 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-10 01:29:59,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=296880.0, ans=0.0 2024-08-10 01:30:05,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=296980.0, ans=0.0 2024-08-10 01:30:07,538 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.308e+01 2.824e+01 3.267e+01 4.012e+01 5.256e+01, threshold=6.535e+01, percent-clipped=0.0 2024-08-10 01:30:13,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=296980.0, ans=0.1 2024-08-10 01:30:14,230 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 01:30:28,545 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.33 vs. limit=15.0 2024-08-10 01:30:36,890 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 12 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 01:30:38,183 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 20 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-10 01:30:58,268 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2024-08-10 01:30:59,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=24.03 vs. limit=15.0 2024-08-10 01:30:59,840 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.37 vs. limit=10.0 2024-08-10 01:31:03,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=297280.0, ans=0.125 2024-08-10 01:31:03,541 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2024-08-10 01:31:04,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=297380.0, ans=0.125 2024-08-10 01:31:04,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=297380.0, ans=0.125 2024-08-10 01:31:05,265 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 750, loss[loss=0.1426, beats_loss=0.008667, ecapa_loss=0.0003556, whisper_loss=0.1304, over 17406.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01228, ecapa_loss=0.0002872, whisper_loss=0.09843, over 3723918.96 frames. ], batch size: 65, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:31:11,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=297380.0, ans=0.125 2024-08-10 01:31:14,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=297380.0, ans=0.2 2024-08-10 01:31:20,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=297480.0, ans=0.0 2024-08-10 01:31:30,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297480.0, ans=0.1 2024-08-10 01:31:33,864 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 01:31:34,272 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=12.0 2024-08-10 01:31:35,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=297580.0, ans=0.0 2024-08-10 01:31:45,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=297580.0, ans=0.125 2024-08-10 01:31:54,501 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.51 vs. limit=15.0 2024-08-10 01:32:01,356 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-10 01:32:18,776 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 800, loss[loss=0.1069, beats_loss=0.0131, ecapa_loss=0.0002472, whisper_loss=0.09132, over 21547.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01225, ecapa_loss=0.0002869, whisper_loss=0.09868, over 3738081.33 frames. ], batch size: 84, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:32:19,426 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 30 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 01:32:35,884 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 2.843e+01 3.241e+01 3.911e+01 6.650e+01, threshold=6.482e+01, percent-clipped=1.0 2024-08-10 01:32:36,369 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.245e+03 2024-08-10 01:32:43,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=297980.0, ans=0.125 2024-08-10 01:32:44,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=297980.0, ans=0.2 2024-08-10 01:32:48,876 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 35 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 01:33:06,111 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 01:33:10,859 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 01:33:13,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=298180.0, ans=0.125 2024-08-10 01:33:21,513 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-10 01:33:31,754 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 01:33:33,052 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 850, loss[loss=0.1063, beats_loss=0.01357, ecapa_loss=0.0002532, whisper_loss=0.09021, over 22358.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01228, ecapa_loss=0.0002857, whisper_loss=0.09814, over 3760821.57 frames. ], batch size: 91, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:33:44,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=298380.0, ans=0.1 2024-08-10 01:34:02,358 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 01:34:05,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=298580.0, ans=15.0 2024-08-10 01:34:05,946 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.41 vs. limit=10.0 2024-08-10 01:34:19,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=298680.0, ans=0.0 2024-08-10 01:34:40,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=298780.0, ans=0.04949747468305833 2024-08-10 01:34:45,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=298780.0, ans=0.125 2024-08-10 01:34:48,348 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 900, loss[loss=0.1018, beats_loss=0.01335, ecapa_loss=0.0002449, whisper_loss=0.08598, over 20908.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01232, ecapa_loss=0.0002846, whisper_loss=0.09836, over 3788242.99 frames. ], batch size: 84, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:34:56,461 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 01:35:06,181 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.811e+01 3.274e+01 3.784e+01 5.899e+01, threshold=6.548e+01, percent-clipped=0.0 2024-08-10 01:35:27,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=299080.0, ans=0.125 2024-08-10 01:35:37,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=299180.0, ans=0.0 2024-08-10 01:35:39,710 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.93 vs. limit=6.0 2024-08-10 01:35:40,996 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.88 vs. limit=22.5 2024-08-10 01:35:43,851 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-10 01:36:03,237 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 950, loss[loss=0.1112, beats_loss=0.01156, ecapa_loss=0.0002506, whisper_loss=0.09715, over 18421.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01234, ecapa_loss=0.0002824, whisper_loss=0.09825, over 3779131.33 frames. ], batch size: 68, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:36:03,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=299380.0, ans=0.125 2024-08-10 01:36:08,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=299380.0, ans=0.125 2024-08-10 01:36:21,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=299480.0, ans=0.2 2024-08-10 01:36:25,372 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 01:36:29,352 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 01:36:51,666 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 01:37:00,629 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.40 vs. limit=15.0 2024-08-10 01:37:10,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=299780.0, ans=0.125 2024-08-10 01:37:18,786 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1000, loss[loss=0.08704, beats_loss=0.01497, ecapa_loss=0.000231, whisper_loss=0.06976, over 18905.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.0123, ecapa_loss=0.0002816, whisper_loss=0.09845, over 3792646.11 frames. ], batch size: 73, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:37:20,842 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 01:37:21,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=299880.0, ans=0.125 2024-08-10 01:37:37,612 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.271e+01 2.926e+01 3.322e+01 3.689e+01 5.712e+01, threshold=6.643e+01, percent-clipped=0.0 2024-08-10 01:37:45,955 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.428e+00 2024-08-10 01:37:53,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=300080.0, ans=0.2 2024-08-10 01:38:10,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=300180.0, ans=0.125 2024-08-10 01:38:19,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=300280.0, ans=0.0 2024-08-10 01:38:22,746 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 01:38:33,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=300380.0, ans=0.1 2024-08-10 01:38:34,457 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1050, loss[loss=0.09948, beats_loss=0.008395, ecapa_loss=0.0003115, whisper_loss=0.08798, over 16929.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01223, ecapa_loss=0.0002804, whisper_loss=0.0989, over 3834991.96 frames. ], batch size: 67, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:38:36,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=300380.0, ans=0.0 2024-08-10 01:38:51,323 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 20 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-10 01:39:12,939 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 01:39:43,340 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2024-08-10 01:39:44,912 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 01:39:50,627 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1100, loss[loss=0.09734, beats_loss=0.01345, ecapa_loss=0.0002827, whisper_loss=0.08106, over 22241.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01228, ecapa_loss=0.0002786, whisper_loss=0.09918, over 3857077.12 frames. ], batch size: 91, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:39:51,207 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 01:39:57,971 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 01:40:08,593 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.873e+01 3.261e+01 3.724e+01 5.464e+01, threshold=6.522e+01, percent-clipped=0.0 2024-08-10 01:40:09,007 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 01:40:18,283 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.09 vs. limit=15.0 2024-08-10 01:40:46,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.91 vs. limit=6.0 2024-08-10 01:40:54,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=301280.0, ans=0.0 2024-08-10 01:41:01,864 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-10 01:41:04,443 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1150, loss[loss=0.09764, beats_loss=0.01515, ecapa_loss=0.0002998, whisper_loss=0.07949, over 23081.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01229, ecapa_loss=0.0002794, whisper_loss=0.09912, over 3845160.93 frames. ], batch size: 94, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:41:16,678 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 01:41:22,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=301480.0, ans=0.125 2024-08-10 01:41:26,213 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-10 01:41:32,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=301580.0, ans=0.125 2024-08-10 01:41:46,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=301580.0, ans=0.125 2024-08-10 01:41:48,771 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 01:42:11,458 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 01:42:18,074 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 01:42:19,176 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1200, loss[loss=0.1242, beats_loss=0.01368, ecapa_loss=0.0002927, whisper_loss=0.1076, over 17405.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01236, ecapa_loss=0.0002796, whisper_loss=0.09887, over 3820078.03 frames. ], batch size: 69, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:42:19,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=301880.0, ans=0.5 2024-08-10 01:42:24,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=301880.0, ans=0.0 2024-08-10 01:42:25,124 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-10 01:42:36,905 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.802e+01 3.225e+01 3.750e+01 6.302e+01, threshold=6.450e+01, percent-clipped=0.0 2024-08-10 01:42:45,857 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.89 vs. limit=15.0 2024-08-10 01:42:52,864 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.40 vs. limit=22.5 2024-08-10 01:43:07,856 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-10 01:43:31,294 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=15.0 2024-08-10 01:43:33,145 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1250, loss[loss=0.1096, beats_loss=0.009315, ecapa_loss=0.000293, whisper_loss=0.09739, over 16675.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01239, ecapa_loss=0.0002799, whisper_loss=0.09889, over 3825394.07 frames. ], batch size: 61, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:43:37,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302380.0, ans=0.1 2024-08-10 01:43:38,336 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 01:43:38,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=302380.0, ans=0.0 2024-08-10 01:43:43,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=302380.0, ans=0.125 2024-08-10 01:43:44,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=302380.0, ans=0.125 2024-08-10 01:43:57,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=302480.0, ans=0.2 2024-08-10 01:44:06,011 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-10 01:44:22,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=302680.0, ans=0.0 2024-08-10 01:44:31,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=302780.0, ans=0.1 2024-08-10 01:44:48,701 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1300, loss[loss=0.107, beats_loss=0.01329, ecapa_loss=0.0002475, whisper_loss=0.09127, over 21385.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01241, ecapa_loss=0.0002804, whisper_loss=0.09834, over 3801019.96 frames. ], batch size: 85, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:45:06,831 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-10 01:45:08,213 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.884e+01 3.264e+01 3.595e+01 5.329e+01, threshold=6.528e+01, percent-clipped=0.0 2024-08-10 01:45:22,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=303080.0, ans=0.125 2024-08-10 01:45:30,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=303080.0, ans=0.0 2024-08-10 01:45:33,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=303080.0, ans=0.125 2024-08-10 01:45:36,593 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-10 01:46:00,753 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 01:46:08,749 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 01:46:10,189 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1350, loss[loss=0.1174, beats_loss=0.009755, ecapa_loss=0.0002893, whisper_loss=0.1048, over 19257.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01239, ecapa_loss=0.0002791, whisper_loss=0.09827, over 3830522.88 frames. ], batch size: 75, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:46:10,358 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 01:46:27,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=303480.0, ans=0.125 2024-08-10 01:46:52,731 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.67 vs. limit=15.0 2024-08-10 01:46:52,762 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-08-10 01:47:01,224 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-10 01:47:04,638 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 01:47:05,808 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-10 01:47:07,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=303680.0, ans=0.125 2024-08-10 01:47:26,981 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1400, loss[loss=0.08219, beats_loss=0.01576, ecapa_loss=0.0001638, whisper_loss=0.06479, over 14229.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01231, ecapa_loss=0.0002796, whisper_loss=0.09878, over 3825771.24 frames. ], batch size: 54, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:47:44,382 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.178e+01 2.877e+01 3.100e+01 3.641e+01 7.400e+01, threshold=6.199e+01, percent-clipped=1.0 2024-08-10 01:47:49,311 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 01:47:49,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=303980.0, ans=0.125 2024-08-10 01:48:05,245 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2024-08-10 01:48:25,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=304180.0, ans=0.125 2024-08-10 01:48:28,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=304280.0, ans=0.125 2024-08-10 01:48:36,601 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-10 01:49:10,822 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1450, loss[loss=0.139, beats_loss=0.009652, ecapa_loss=0.0003067, whisper_loss=0.1263, over 20798.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01227, ecapa_loss=0.0002812, whisper_loss=0.09873, over 3829251.04 frames. ], batch size: 84, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:49:13,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=304380.0, ans=0.0 2024-08-10 01:49:14,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=304380.0, ans=0.125 2024-08-10 01:49:21,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304380.0, ans=0.1 2024-08-10 01:49:48,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=304580.0, ans=0.125 2024-08-10 01:50:08,107 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.08 vs. limit=12.0 2024-08-10 01:50:29,232 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.562e+00 2024-08-10 01:50:30,194 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1500, loss[loss=0.1255, beats_loss=0.00894, ecapa_loss=0.0002965, whisper_loss=0.1136, over 20253.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01238, ecapa_loss=0.0002787, whisper_loss=0.09766, over 3824916.63 frames. ], batch size: 77, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:50:40,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=304880.0, ans=0.125 2024-08-10 01:50:42,410 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 01:50:45,566 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-10 01:50:49,821 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.825e+01 3.192e+01 3.755e+01 6.662e+01, threshold=6.384e+01, percent-clipped=1.0 2024-08-10 01:51:01,057 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 31 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-10 01:51:05,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=305080.0, ans=0.04949747468305833 2024-08-10 01:51:15,405 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 30 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 01:51:32,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=305280.0, ans=0.125 2024-08-10 01:51:32,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=305280.0, ans=0.1 2024-08-10 01:51:45,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=305280.0, ans=0.0 2024-08-10 01:51:48,657 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1550, loss[loss=0.1286, beats_loss=0.008995, ecapa_loss=0.0003193, whisper_loss=0.1164, over 14660.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.0124, ecapa_loss=0.0002787, whisper_loss=0.09805, over 3808892.57 frames. ], batch size: 56, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:51:52,226 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=15.0 2024-08-10 01:51:53,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=305380.0, ans=0.1 2024-08-10 01:51:59,725 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=5.14 vs. limit=15.0 2024-08-10 01:52:00,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=305380.0, ans=0.1 2024-08-10 01:52:10,887 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.98 vs. limit=15.0 2024-08-10 01:52:13,300 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 01:52:33,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=305580.0, ans=0.125 2024-08-10 01:52:38,359 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.88 vs. limit=15.0 2024-08-10 01:52:46,994 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.60 vs. limit=22.5 2024-08-10 01:52:57,230 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 01:53:07,953 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1600, loss[loss=0.1009, beats_loss=0.01324, ecapa_loss=0.000249, whisper_loss=0.08517, over 15893.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01245, ecapa_loss=0.0002775, whisper_loss=0.09785, over 3806494.11 frames. ], batch size: 62, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:53:27,757 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.961e+01 3.443e+01 4.067e+01 6.226e+01, threshold=6.887e+01, percent-clipped=0.0 2024-08-10 01:53:36,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=305980.0, ans=0.125 2024-08-10 01:53:48,277 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2024-08-10 01:54:15,697 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2024-08-10 01:54:26,983 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1650, loss[loss=0.1095, beats_loss=0.01418, ecapa_loss=0.0002522, whisper_loss=0.09278, over 23070.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.0125, ecapa_loss=0.0002775, whisper_loss=0.09763, over 3832006.88 frames. ], batch size: 91, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:54:43,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=306480.0, ans=0.125 2024-08-10 01:54:57,738 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-10 01:55:08,087 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 01:55:10,544 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-10 01:55:11,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=306580.0, ans=0.2 2024-08-10 01:55:20,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=306680.0, ans=0.2 2024-08-10 01:55:32,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=306780.0, ans=0.125 2024-08-10 01:55:38,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=306780.0, ans=0.1 2024-08-10 01:55:43,826 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1700, loss[loss=0.08682, beats_loss=0.012, ecapa_loss=0.0002555, whisper_loss=0.07227, over 16652.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01239, ecapa_loss=0.0002807, whisper_loss=0.09835, over 3840823.45 frames. ], batch size: 66, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:55:48,870 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 33 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 01:55:49,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=306880.0, ans=0.1 2024-08-10 01:56:01,964 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.503e+01 3.006e+01 3.281e+01 3.850e+01 2.955e+02, threshold=6.563e+01, percent-clipped=2.0 2024-08-10 01:56:06,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=306980.0, ans=0.0 2024-08-10 01:56:14,224 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 01:56:17,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=307080.0, ans=0.2 2024-08-10 01:56:20,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=307080.0, ans=0.07 2024-08-10 01:56:57,894 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1750, loss[loss=0.1134, beats_loss=0.01358, ecapa_loss=0.0002353, whisper_loss=0.09746, over 15000.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01239, ecapa_loss=0.0002783, whisper_loss=0.09791, over 3846682.20 frames. ], batch size: 58, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:57:03,603 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 01:57:26,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=307580.0, ans=0.1 2024-08-10 01:57:27,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=307580.0, ans=0.125 2024-08-10 01:57:30,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=307580.0, ans=0.2 2024-08-10 01:57:36,266 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 22 from LS+wenet, 16 from Vox, 16 fro AS 2024-08-10 01:57:46,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=307680.0, ans=0.1 2024-08-10 01:57:57,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=307780.0, ans=0.2 2024-08-10 01:58:05,214 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-10 01:58:09,332 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1800, loss[loss=0.1326, beats_loss=0.009179, ecapa_loss=0.0002984, whisper_loss=0.1205, over 15610.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01228, ecapa_loss=0.0002773, whisper_loss=0.09816, over 3825825.49 frames. ], batch size: 61, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:58:26,315 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.751e+01 3.157e+01 3.582e+01 5.631e+01, threshold=6.314e+01, percent-clipped=0.0 2024-08-10 01:58:50,828 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 01:59:11,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=308280.0, ans=0.125 2024-08-10 01:59:11,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=308280.0, ans=0.1 2024-08-10 01:59:20,412 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1850, loss[loss=0.1155, beats_loss=0.01079, ecapa_loss=0.0002883, whisper_loss=0.1018, over 15904.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.0122, ecapa_loss=0.0002812, whisper_loss=0.09926, over 3831459.60 frames. ], batch size: 62, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:59:20,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=308380.0, ans=0.1 2024-08-10 01:59:23,403 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 01:59:25,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=308380.0, ans=0.125 2024-08-10 01:59:31,069 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.03 vs. limit=22.5 2024-08-10 01:59:34,940 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 01:59:41,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=308480.0, ans=0.2 2024-08-10 01:59:53,642 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 01:59:54,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=308580.0, ans=0.0 2024-08-10 01:59:54,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=308580.0, ans=0.125 2024-08-10 01:59:57,943 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.03 vs. limit=22.5 2024-08-10 02:00:08,803 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2024-08-10 02:00:14,949 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 02:00:15,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=308780.0, ans=0.125 2024-08-10 02:00:21,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=308780.0, ans=0.0 2024-08-10 02:00:23,418 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 02:00:30,594 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1900, loss[loss=0.09942, beats_loss=0.01137, ecapa_loss=0.0002535, whisper_loss=0.08552, over 15658.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01223, ecapa_loss=0.0002852, whisper_loss=0.0997, over 3833719.83 frames. ], batch size: 59, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 02:00:30,753 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 02:00:37,024 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.99 vs. limit=12.0 2024-08-10 02:00:42,984 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.57 vs. limit=15.0 2024-08-10 02:00:46,962 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.885e+03 2024-08-10 02:00:47,777 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 2.899e+01 3.416e+01 4.271e+01 7.702e+01, threshold=6.832e+01, percent-clipped=2.0 2024-08-10 02:00:51,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=308980.0, ans=0.2 2024-08-10 02:00:52,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=308980.0, ans=0.2 2024-08-10 02:00:53,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=308980.0, ans=0.1 2024-08-10 02:00:53,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=308980.0, ans=0.1 2024-08-10 02:01:00,114 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-10 02:01:00,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=309080.0, ans=0.0 2024-08-10 02:01:04,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=309080.0, ans=0.125 2024-08-10 02:01:11,713 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.01 vs. limit=22.5 2024-08-10 02:01:12,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=309180.0, ans=0.0 2024-08-10 02:01:25,083 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.06 vs. limit=15.0 2024-08-10 02:01:31,551 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 02:01:33,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=309280.0, ans=0.0 2024-08-10 02:01:39,573 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 1950, loss[loss=0.1295, beats_loss=0.01128, ecapa_loss=0.0003634, whisper_loss=0.1146, over 20238.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.0123, ecapa_loss=0.0002894, whisper_loss=0.09917, over 3828883.60 frames. ], batch size: 84, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 02:01:50,683 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-10 02:02:02,861 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.123e-02 2024-08-10 02:02:11,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=309580.0, ans=0.125 2024-08-10 02:02:14,378 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.47 vs. limit=10.0 2024-08-10 02:02:23,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=309680.0, ans=0.0 2024-08-10 02:02:23,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309680.0, ans=0.1 2024-08-10 02:02:31,147 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 02:02:44,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309780.0, ans=0.1 2024-08-10 02:02:50,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=309880.0, ans=0.0 2024-08-10 02:02:51,091 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2000, loss[loss=0.1374, beats_loss=0.008102, ecapa_loss=0.0003055, whisper_loss=0.1263, over 19558.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01228, ecapa_loss=0.0002907, whisper_loss=0.09886, over 3809550.71 frames. ], batch size: 74, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:02:57,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=309880.0, ans=0.0 2024-08-10 02:02:59,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=309880.0, ans=0.5 2024-08-10 02:03:09,441 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.384e+01 2.983e+01 3.552e+01 3.984e+01 6.262e+01, threshold=7.103e+01, percent-clipped=0.0 2024-08-10 02:03:11,346 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 02:03:12,128 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.01 vs. limit=15.0 2024-08-10 02:03:12,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=309980.0, ans=0.125 2024-08-10 02:03:19,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=310080.0, ans=0.125 2024-08-10 02:03:28,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=310080.0, ans=0.2 2024-08-10 02:03:51,113 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 17 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 02:04:03,934 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2050, loss[loss=0.1436, beats_loss=0.008649, ecapa_loss=0.000393, whisper_loss=0.131, over 18659.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01237, ecapa_loss=0.000292, whisper_loss=0.09869, over 3857474.54 frames. ], batch size: 73, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:04:11,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=310380.0, ans=0.125 2024-08-10 02:04:11,701 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.41 vs. limit=15.0 2024-08-10 02:04:16,387 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-10 02:04:24,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=310480.0, ans=10.0 2024-08-10 02:04:25,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=310480.0, ans=0.1 2024-08-10 02:04:29,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=310480.0, ans=0.2 2024-08-10 02:04:30,750 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 02:04:33,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=310580.0, ans=0.125 2024-08-10 02:04:37,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=310580.0, ans=0.125 2024-08-10 02:05:01,321 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-10 02:05:13,001 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2100, loss[loss=0.1099, beats_loss=0.01356, ecapa_loss=0.0003086, whisper_loss=0.09321, over 23523.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01242, ecapa_loss=0.0002951, whisper_loss=0.09879, over 3848944.46 frames. ], batch size: 95, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:05:16,797 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=1.96 vs. limit=15.0 2024-08-10 02:05:22,286 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2024-08-10 02:05:24,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=310880.0, ans=0.125 2024-08-10 02:05:25,975 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2024-08-10 02:05:29,473 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.901e+01 3.264e+01 3.705e+01 5.595e+01, threshold=6.528e+01, percent-clipped=0.0 2024-08-10 02:05:31,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=310980.0, ans=0.0 2024-08-10 02:05:31,763 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.90 vs. limit=10.0 2024-08-10 02:05:32,421 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 02:05:37,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=310980.0, ans=0.5 2024-08-10 02:05:38,459 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-10 02:05:56,470 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 02:06:03,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=311180.0, ans=0.125 2024-08-10 02:06:17,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=311280.0, ans=0.2 2024-08-10 02:06:22,672 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.01 vs. limit=15.0 2024-08-10 02:06:23,174 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2150, loss[loss=0.1189, beats_loss=0.01251, ecapa_loss=0.0002769, whisper_loss=0.1036, over 23200.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01242, ecapa_loss=0.0002949, whisper_loss=0.09927, over 3861879.79 frames. ], batch size: 92, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:06:29,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=311380.0, ans=0.125 2024-08-10 02:06:35,362 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-10 02:06:46,687 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.36 vs. limit=6.0 2024-08-10 02:06:57,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=311580.0, ans=0.125 2024-08-10 02:07:27,775 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2024-08-10 02:07:38,023 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.84 vs. limit=15.0 2024-08-10 02:07:38,366 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2200, loss[loss=0.08827, beats_loss=0.01455, ecapa_loss=0.0002717, whisper_loss=0.07101, over 15732.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.0124, ecapa_loss=0.0002936, whisper_loss=0.09977, over 3849828.89 frames. ], batch size: 64, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:07:55,212 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.913e+01 3.407e+01 3.904e+01 7.612e+01, threshold=6.814e+01, percent-clipped=1.0 2024-08-10 02:08:06,274 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=15.0 2024-08-10 02:08:11,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=312080.0, ans=0.125 2024-08-10 02:08:13,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=312080.0, ans=0.0 2024-08-10 02:08:15,487 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 02:08:22,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=312180.0, ans=0.0 2024-08-10 02:08:23,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=312180.0, ans=0.125 2024-08-10 02:08:43,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=312280.0, ans=0.125 2024-08-10 02:08:46,560 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 02:08:50,728 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2250, loss[loss=0.1244, beats_loss=0.01285, ecapa_loss=0.0002922, whisper_loss=0.1087, over 22744.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01241, ecapa_loss=0.0002947, whisper_loss=0.1003, over 3850688.26 frames. ], batch size: 90, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:08:51,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=312380.0, ans=0.125 2024-08-10 02:09:06,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=312480.0, ans=0.2 2024-08-10 02:09:08,761 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 02:09:14,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=312480.0, ans=0.125 2024-08-10 02:09:14,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=312480.0, ans=0.125 2024-08-10 02:09:47,703 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=15.0 2024-08-10 02:10:01,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=312780.0, ans=0.0 2024-08-10 02:10:03,867 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2300, loss[loss=0.09143, beats_loss=0.01487, ecapa_loss=0.0002678, whisper_loss=0.07387, over 21950.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01235, ecapa_loss=0.0002965, whisper_loss=0.1015, over 3849888.85 frames. ], batch size: 90, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:10:08,070 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 02:10:10,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=312880.0, ans=0.0 2024-08-10 02:10:21,412 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 3.060e+01 3.416e+01 3.893e+01 7.548e+01, threshold=6.833e+01, percent-clipped=2.0 2024-08-10 02:10:26,271 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.905e-02 2024-08-10 02:10:36,379 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-08-10 02:10:37,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2024-08-10 02:11:10,836 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 02:11:14,756 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2350, loss[loss=0.1141, beats_loss=0.0123, ecapa_loss=0.0002823, whisper_loss=0.09896, over 13854.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01227, ecapa_loss=0.0002977, whisper_loss=0.1014, over 3842778.00 frames. ], batch size: 54, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:11:28,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=313480.0, ans=0.0 2024-08-10 02:11:28,569 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2024-08-10 02:11:40,135 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 02:11:41,723 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-10 02:11:42,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=313580.0, ans=0.07 2024-08-10 02:11:52,600 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-10 02:11:52,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=313580.0, ans=0.2 2024-08-10 02:11:57,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=313680.0, ans=0.125 2024-08-10 02:12:06,681 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2024-08-10 02:12:11,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=313680.0, ans=0.125 2024-08-10 02:12:18,298 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 02:12:28,479 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2400, loss[loss=0.1117, beats_loss=0.01022, ecapa_loss=0.0003124, whisper_loss=0.09834, over 15062.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.0122, ecapa_loss=0.0002991, whisper_loss=0.1016, over 3849082.73 frames. ], batch size: 58, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:12:28,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=313880.0, ans=0.0 2024-08-10 02:12:29,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=313880.0, ans=0.0 2024-08-10 02:12:30,607 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.72 vs. limit=15.0 2024-08-10 02:12:31,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=313880.0, ans=0.0 2024-08-10 02:12:44,961 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.324e+01 3.008e+01 3.355e+01 4.317e+01 6.888e+01, threshold=6.709e+01, percent-clipped=1.0 2024-08-10 02:13:02,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=314080.0, ans=0.1 2024-08-10 02:13:08,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=314080.0, ans=0.2 2024-08-10 02:13:11,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=314180.0, ans=0.1 2024-08-10 02:13:37,110 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-10 02:13:37,572 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2024-08-10 02:13:40,406 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2450, loss[loss=0.1134, beats_loss=0.01414, ecapa_loss=0.0002763, whisper_loss=0.09645, over 21044.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01222, ecapa_loss=0.000299, whisper_loss=0.1012, over 3837291.91 frames. ], batch size: 85, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:13:43,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=314380.0, ans=0.1 2024-08-10 02:14:04,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=314480.0, ans=10.0 2024-08-10 02:14:04,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=314480.0, ans=0.025 2024-08-10 02:14:14,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=314580.0, ans=0.125 2024-08-10 02:14:21,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=314580.0, ans=0.125 2024-08-10 02:14:30,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=314680.0, ans=0.125 2024-08-10 02:14:31,373 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-10 02:14:31,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=314680.0, ans=0.2 2024-08-10 02:14:35,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=314680.0, ans=0.125 2024-08-10 02:14:44,613 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 33 from Vox, 26 fro AS 2024-08-10 02:14:54,317 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2500, loss[loss=0.1129, beats_loss=0.01217, ecapa_loss=0.0003043, whisper_loss=0.09766, over 22595.00 frames. ], tot_loss[loss=0.116, beats_loss=0.0122, ecapa_loss=0.0002996, whisper_loss=0.1009, over 3859161.26 frames. ], batch size: 91, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:15:12,225 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 3.053e+01 3.458e+01 4.005e+01 5.985e+01, threshold=6.915e+01, percent-clipped=0.0 2024-08-10 02:15:19,918 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-10 02:15:37,057 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.56 vs. limit=15.0 2024-08-10 02:15:44,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=315180.0, ans=0.0 2024-08-10 02:15:47,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=315180.0, ans=0.0 2024-08-10 02:15:52,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=315280.0, ans=0.0 2024-08-10 02:15:52,587 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-08-10 02:16:07,851 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2550, loss[loss=0.1218, beats_loss=0.01204, ecapa_loss=0.0003135, whisper_loss=0.1067, over 18521.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01209, ecapa_loss=0.000299, whisper_loss=0.1018, over 3884692.92 frames. ], batch size: 71, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:16:08,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=315380.0, ans=0.125 2024-08-10 02:16:12,205 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 30 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-10 02:16:29,580 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 02:16:34,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=315480.0, ans=0.125 2024-08-10 02:16:37,092 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=15.0 2024-08-10 02:16:38,863 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2024-08-10 02:16:48,547 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 02:16:48,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=315580.0, ans=0.125 2024-08-10 02:17:13,445 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.01 vs. limit=15.0 2024-08-10 02:17:19,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=315880.0, ans=0.125 2024-08-10 02:17:19,659 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-08-10 02:17:20,804 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2600, loss[loss=0.1253, beats_loss=0.01292, ecapa_loss=0.0003502, whisper_loss=0.1089, over 22167.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01215, ecapa_loss=0.0002996, whisper_loss=0.1007, over 3851095.17 frames. ], batch size: 93, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:17:24,765 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.73 vs. limit=15.0 2024-08-10 02:17:28,264 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 02:17:34,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=315980.0, ans=0.125 2024-08-10 02:17:38,497 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+01 2.748e+01 3.170e+01 3.706e+01 6.461e+01, threshold=6.341e+01, percent-clipped=0.0 2024-08-10 02:17:40,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=315980.0, ans=0.125 2024-08-10 02:17:40,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=315980.0, ans=0.125 2024-08-10 02:17:53,379 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 02:17:55,369 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.27 vs. limit=6.0 2024-08-10 02:18:05,735 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 02:18:16,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=316180.0, ans=0.2 2024-08-10 02:18:20,287 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 02:18:38,118 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2650, loss[loss=0.09388, beats_loss=0.01541, ecapa_loss=0.000297, whisper_loss=0.0755, over 17038.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01212, ecapa_loss=0.0003018, whisper_loss=0.1007, over 3845328.79 frames. ], batch size: 70, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:18:42,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=316380.0, ans=0.125 2024-08-10 02:19:05,780 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 02:19:16,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=316580.0, ans=0.125 2024-08-10 02:19:22,977 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2024-08-10 02:19:52,120 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.98 vs. limit=15.0 2024-08-10 02:19:54,366 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2700, loss[loss=0.1054, beats_loss=0.01458, ecapa_loss=0.0003185, whisper_loss=0.0876, over 18124.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01214, ecapa_loss=0.0003035, whisper_loss=0.1007, over 3853910.37 frames. ], batch size: 75, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:20:01,139 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.69 vs. limit=15.0 2024-08-10 02:20:06,204 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.720e-01 2024-08-10 02:20:09,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=316980.0, ans=0.1 2024-08-10 02:20:11,906 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.187e+01 2.947e+01 3.317e+01 3.968e+01 5.790e+01, threshold=6.635e+01, percent-clipped=0.0 2024-08-10 02:20:36,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=317080.0, ans=0.125 2024-08-10 02:20:39,865 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2024-08-10 02:20:40,419 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 02:20:55,860 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.94 vs. limit=12.0 2024-08-10 02:21:07,859 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2750, loss[loss=0.1226, beats_loss=0.01099, ecapa_loss=0.0003205, whisper_loss=0.1084, over 17354.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.0121, ecapa_loss=0.0003028, whisper_loss=0.1005, over 3820109.67 frames. ], batch size: 67, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:21:15,775 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 02:21:22,002 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.83 vs. limit=22.5 2024-08-10 02:21:25,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=15.0 2024-08-10 02:22:04,590 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=317680.0, ans=0.1 2024-08-10 02:22:21,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=317780.0, ans=0.125 2024-08-10 02:22:24,274 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2800, loss[loss=0.1112, beats_loss=0.01261, ecapa_loss=0.0002889, whisper_loss=0.09572, over 20416.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01222, ecapa_loss=0.0003002, whisper_loss=0.1001, over 3828120.74 frames. ], batch size: 79, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:22:26,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=317880.0, ans=0.09899494936611666 2024-08-10 02:22:43,054 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.196e+01 3.037e+01 3.440e+01 4.229e+01 1.125e+02, threshold=6.879e+01, percent-clipped=1.0 2024-08-10 02:22:49,560 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 02:22:54,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=318080.0, ans=0.0 2024-08-10 02:23:06,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=318080.0, ans=0.2 2024-08-10 02:23:18,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=318180.0, ans=0.125 2024-08-10 02:23:18,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=318180.0, ans=0.125 2024-08-10 02:23:22,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=318180.0, ans=0.0 2024-08-10 02:23:25,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=318280.0, ans=0.1 2024-08-10 02:23:39,711 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2850, loss[loss=0.1486, beats_loss=0.01066, ecapa_loss=0.0002801, whisper_loss=0.1351, over 23145.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01225, ecapa_loss=0.0003002, whisper_loss=0.1005, over 3838488.53 frames. ], batch size: 90, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:23:39,836 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 18 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 02:23:42,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=318380.0, ans=0.0 2024-08-10 02:23:46,667 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 02:23:48,035 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 02:24:44,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=318780.0, ans=0.2 2024-08-10 02:24:48,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=318780.0, ans=0.125 2024-08-10 02:24:52,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=318780.0, ans=0.125 2024-08-10 02:24:55,633 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-10 02:25:01,641 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2900, loss[loss=0.1382, beats_loss=0.01128, ecapa_loss=0.0003031, whisper_loss=0.1239, over 22689.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.0124, ecapa_loss=0.0002989, whisper_loss=0.09991, over 3839751.34 frames. ], batch size: 89, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:25:04,579 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 02:25:05,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=318880.0, ans=0.125 2024-08-10 02:25:09,535 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 37 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 02:25:09,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=318880.0, ans=0.125 2024-08-10 02:25:14,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=15.0 2024-08-10 02:25:19,871 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.293e+01 2.971e+01 3.564e+01 4.159e+01 7.122e+01, threshold=7.127e+01, percent-clipped=1.0 2024-08-10 02:25:22,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=318980.0, ans=0.5 2024-08-10 02:25:28,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=318980.0, ans=0.1 2024-08-10 02:25:56,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=319180.0, ans=0.0 2024-08-10 02:26:00,910 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 14 from Vox, 50 fro AS 2024-08-10 02:26:07,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=319280.0, ans=0.125 2024-08-10 02:26:13,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=319280.0, ans=0.1 2024-08-10 02:26:17,611 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 2950, loss[loss=0.1179, beats_loss=0.011, ecapa_loss=0.0003405, whisper_loss=0.1035, over 22715.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01245, ecapa_loss=0.0002972, whisper_loss=0.09947, over 3858074.29 frames. ], batch size: 91, lr: 2.00e-02, grad_scale: 1048576.0 2024-08-10 02:26:36,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=319480.0, ans=0.1 2024-08-10 02:26:52,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=319580.0, ans=0.125 2024-08-10 02:26:56,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=319680.0, ans=0.125 2024-08-10 02:27:05,122 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2024-08-10 02:27:14,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=319780.0, ans=0.125 2024-08-10 02:27:23,996 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3000, loss[loss=0.1408, beats_loss=0.01258, ecapa_loss=0.0003146, whisper_loss=0.1251, over 21911.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01256, ecapa_loss=0.0002978, whisper_loss=0.09871, over 3891947.39 frames. ], batch size: 90, lr: 2.00e-02, grad_scale: 1048576.0 2024-08-10 02:27:23,996 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-10 02:28:04,471 INFO [train_multi_KD3.py:1149] (2/4) Epoch 3, validation on ASR_libri: loss=0.2772, beats_loss=0, ecapa_loss=0.0008938, whisper_loss=0.2682, over 922467.00 frames. 2024-08-10 02:28:22,931 INFO [train_multi_KD3.py:1149] (2/4) Epoch 3, validation on SV_voxceleb1: loss=0.007832, beats_loss=0, ecapa_loss=0.0007832, whisper_loss=0, over 939242.00 frames. 2024-08-10 02:30:19,800 INFO [train_multi_KD3.py:1149] (2/4) Epoch 3, validation on AT_audioset: loss=0.02861, beats_loss=0.02861, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 02:30:19,804 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 02:30:22,863 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 02:30:24,249 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-10 02:30:28,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=319880.0, ans=0.125 2024-08-10 02:30:38,977 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 2.871e+01 3.251e+01 3.853e+01 5.451e+01, threshold=6.502e+01, percent-clipped=0.0 2024-08-10 02:30:43,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=319980.0, ans=0.125 2024-08-10 02:30:47,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=319980.0, ans=0.025 2024-08-10 02:30:48,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=320080.0, ans=0.0 2024-08-10 02:30:49,824 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 02:30:51,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=320080.0, ans=0.1 2024-08-10 02:30:55,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=320080.0, ans=0.0 2024-08-10 02:31:12,563 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2024-08-10 02:31:21,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=320280.0, ans=0.125 2024-08-10 02:31:23,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=320280.0, ans=0.1 2024-08-10 02:31:30,926 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3050, loss[loss=0.1311, beats_loss=0.0132, ecapa_loss=0.00025, whisper_loss=0.1154, over 22868.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01246, ecapa_loss=0.0002976, whisper_loss=0.1001, over 3929960.55 frames. ], batch size: 91, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:31:31,095 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 02:31:31,516 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.536e-02 2024-08-10 02:31:43,445 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 30 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 02:32:22,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.21 vs. limit=22.5 2024-08-10 02:32:33,189 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 02:32:35,330 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.44 vs. limit=22.5 2024-08-10 02:32:38,922 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 02:32:40,000 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3100, loss[loss=0.1222, beats_loss=0.01263, ecapa_loss=0.000224, whisper_loss=0.1073, over 22966.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01248, ecapa_loss=0.000297, whisper_loss=0.09994, over 3928143.85 frames. ], batch size: 88, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:32:56,527 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 2.934e+01 3.353e+01 3.892e+01 7.432e+01, threshold=6.707e+01, percent-clipped=2.0 2024-08-10 02:32:56,861 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 02:32:58,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=320980.0, ans=0.125 2024-08-10 02:32:59,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=320980.0, ans=0.1 2024-08-10 02:33:07,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=321080.0, ans=0.125 2024-08-10 02:33:13,404 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 02:33:24,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=321180.0, ans=0.125 2024-08-10 02:33:28,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=321180.0, ans=0.125 2024-08-10 02:33:44,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=321280.0, ans=0.125 2024-08-10 02:33:48,417 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3150, loss[loss=0.124, beats_loss=0.01001, ecapa_loss=0.0003232, whisper_loss=0.1108, over 17030.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01246, ecapa_loss=0.0002969, whisper_loss=0.09986, over 3932089.36 frames. ], batch size: 67, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:34:19,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=321580.0, ans=0.0 2024-08-10 02:34:23,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=321580.0, ans=0.0 2024-08-10 02:34:26,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=321580.0, ans=0.0 2024-08-10 02:34:28,634 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 02:34:42,485 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 02:34:50,675 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 02:34:53,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=321780.0, ans=0.0 2024-08-10 02:34:53,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=321780.0, ans=0.0 2024-08-10 02:34:57,456 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3200, loss[loss=0.1191, beats_loss=0.01559, ecapa_loss=0.0002476, whisper_loss=0.101, over 18520.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01251, ecapa_loss=0.0002964, whisper_loss=0.09983, over 3952388.44 frames. ], batch size: 73, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:34:57,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=321880.0, ans=0.2 2024-08-10 02:35:01,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=321880.0, ans=0.125 2024-08-10 02:35:08,047 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 02:35:13,219 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.260e+01 2.789e+01 3.261e+01 3.853e+01 5.155e+01, threshold=6.521e+01, percent-clipped=0.0 2024-08-10 02:35:22,530 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2024-08-10 02:35:23,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=322080.0, ans=0.0 2024-08-10 02:35:25,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=322080.0, ans=0.125 2024-08-10 02:35:27,044 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2024-08-10 02:35:40,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=322180.0, ans=0.0 2024-08-10 02:35:46,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=322180.0, ans=0.0 2024-08-10 02:36:04,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=322280.0, ans=0.125 2024-08-10 02:36:06,611 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3250, loss[loss=0.1, beats_loss=0.01136, ecapa_loss=0.0003376, whisper_loss=0.08531, over 15269.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01241, ecapa_loss=0.0002971, whisper_loss=0.1002, over 3932603.72 frames. ], batch size: 63, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:36:12,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=322380.0, ans=0.0 2024-08-10 02:36:16,760 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.103e-02 2024-08-10 02:36:27,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=15.0 2024-08-10 02:36:48,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=322680.0, ans=0.2 2024-08-10 02:37:07,006 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 02:37:09,043 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.53 vs. limit=22.5 2024-08-10 02:37:15,010 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3300, loss[loss=0.1098, beats_loss=0.01048, ecapa_loss=0.0003338, whisper_loss=0.096, over 20973.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01247, ecapa_loss=0.0002972, whisper_loss=0.09984, over 3917381.33 frames. ], batch size: 87, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:37:31,366 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 3.021e+01 3.431e+01 4.015e+01 7.071e+01, threshold=6.862e+01, percent-clipped=2.0 2024-08-10 02:37:31,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=322980.0, ans=0.125 2024-08-10 02:37:40,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=322980.0, ans=0.0 2024-08-10 02:37:41,658 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 02:37:43,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=323080.0, ans=0.125 2024-08-10 02:37:46,203 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 02:37:50,718 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.14 vs. limit=15.0 2024-08-10 02:37:51,278 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 15 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 02:38:01,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=323180.0, ans=0.0 2024-08-10 02:38:01,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=323180.0, ans=0.125 2024-08-10 02:38:19,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.74 vs. limit=22.5 2024-08-10 02:38:20,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=323280.0, ans=0.0 2024-08-10 02:38:23,797 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3350, loss[loss=0.08843, beats_loss=0.01404, ecapa_loss=0.0002658, whisper_loss=0.07173, over 22986.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01241, ecapa_loss=0.0002996, whisper_loss=0.09991, over 3896023.22 frames. ], batch size: 92, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:38:23,958 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 29 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-10 02:38:40,278 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 02:39:11,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=323680.0, ans=0.0 2024-08-10 02:39:18,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=323780.0, ans=0.1 2024-08-10 02:39:31,264 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3400, loss[loss=0.1105, beats_loss=0.009232, ecapa_loss=0.0003542, whisper_loss=0.09768, over 14863.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.0123, ecapa_loss=0.0002982, whisper_loss=0.1009, over 3879151.09 frames. ], batch size: 60, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:39:47,485 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.313e+01 2.812e+01 3.293e+01 3.899e+01 6.283e+01, threshold=6.585e+01, percent-clipped=0.0 2024-08-10 02:39:54,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=323980.0, ans=0.125 2024-08-10 02:40:03,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=324080.0, ans=0.125 2024-08-10 02:40:09,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=324080.0, ans=0.0 2024-08-10 02:40:09,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=324080.0, ans=0.2 2024-08-10 02:40:34,530 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 02:40:39,478 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3450, loss[loss=0.1453, beats_loss=0.0101, ecapa_loss=0.0003744, whisper_loss=0.1315, over 22688.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01227, ecapa_loss=0.0003013, whisper_loss=0.1007, over 3891370.91 frames. ], batch size: 91, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:41:28,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=324680.0, ans=0.125 2024-08-10 02:41:28,691 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.154e+05 2024-08-10 02:41:32,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=324680.0, ans=0.125 2024-08-10 02:41:48,676 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3500, loss[loss=0.1208, beats_loss=0.01306, ecapa_loss=0.0002653, whisper_loss=0.1051, over 23464.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01232, ecapa_loss=0.0003004, whisper_loss=0.1004, over 3898900.77 frames. ], batch size: 94, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:41:49,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=324880.0, ans=0.2 2024-08-10 02:41:56,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=324880.0, ans=0.2 2024-08-10 02:41:58,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=324880.0, ans=0.2 2024-08-10 02:42:03,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=324980.0, ans=0.125 2024-08-10 02:42:05,293 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 3.058e+01 3.643e+01 4.338e+01 7.554e+01, threshold=7.285e+01, percent-clipped=1.0 2024-08-10 02:42:05,576 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 22 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-10 02:42:22,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=325080.0, ans=0.025 2024-08-10 02:42:23,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=325080.0, ans=0.07 2024-08-10 02:42:57,743 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3550, loss[loss=0.1231, beats_loss=0.01264, ecapa_loss=0.0003031, whisper_loss=0.1074, over 21431.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01224, ecapa_loss=0.0003009, whisper_loss=0.09995, over 3902274.82 frames. ], batch size: 91, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:42:57,913 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 02:42:58,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=325380.0, ans=0.0 2024-08-10 02:43:01,461 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.33 vs. limit=12.0 2024-08-10 02:43:08,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=325380.0, ans=0.2 2024-08-10 02:43:18,589 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.54 vs. limit=12.0 2024-08-10 02:43:42,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=325680.0, ans=0.125 2024-08-10 02:43:42,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=325680.0, ans=0.0 2024-08-10 02:43:44,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=325680.0, ans=0.025 2024-08-10 02:43:56,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=325780.0, ans=0.125 2024-08-10 02:44:07,112 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3600, loss[loss=0.09686, beats_loss=0.01443, ecapa_loss=0.0003887, whisper_loss=0.07855, over 12825.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.0122, ecapa_loss=0.0003004, whisper_loss=0.1002, over 3875346.07 frames. ], batch size: 58, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:44:15,775 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 02:44:23,881 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.942e+01 3.351e+01 3.815e+01 6.062e+01, threshold=6.702e+01, percent-clipped=0.0 2024-08-10 02:44:45,068 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 02:44:45,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=326080.0, ans=0.0 2024-08-10 02:44:45,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=326080.0, ans=0.07 2024-08-10 02:45:02,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=326280.0, ans=0.1 2024-08-10 02:45:04,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=326280.0, ans=0.125 2024-08-10 02:45:05,863 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 27 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 02:45:09,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=326280.0, ans=0.125 2024-08-10 02:45:17,150 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3650, loss[loss=0.09131, beats_loss=0.0136, ecapa_loss=0.0002958, whisper_loss=0.07476, over 21013.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01233, ecapa_loss=0.0002992, whisper_loss=0.1004, over 3859609.90 frames. ], batch size: 86, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:45:26,360 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2024-08-10 02:45:33,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=326480.0, ans=10.0 2024-08-10 02:45:40,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=326480.0, ans=0.1 2024-08-10 02:45:51,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=326580.0, ans=0.0 2024-08-10 02:45:54,242 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 02:45:58,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=326680.0, ans=0.0 2024-08-10 02:45:59,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=326680.0, ans=0.125 2024-08-10 02:46:24,639 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 02:46:25,743 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3700, loss[loss=0.1179, beats_loss=0.01187, ecapa_loss=0.0002785, whisper_loss=0.1033, over 18449.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01238, ecapa_loss=0.0002988, whisper_loss=0.1003, over 3856848.76 frames. ], batch size: 73, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:46:31,013 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=22.5 2024-08-10 02:46:31,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=326880.0, ans=0.125 2024-08-10 02:46:40,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=326980.0, ans=0.035 2024-08-10 02:46:42,266 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+01 2.947e+01 3.360e+01 4.039e+01 7.794e+01, threshold=6.721e+01, percent-clipped=1.0 2024-08-10 02:47:06,130 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 02:47:15,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=327180.0, ans=0.0 2024-08-10 02:47:32,333 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.46 vs. limit=10.0 2024-08-10 02:47:32,866 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 02:47:33,963 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3750, loss[loss=0.1113, beats_loss=0.01201, ecapa_loss=0.0003396, whisper_loss=0.09591, over 21973.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01245, ecapa_loss=0.0002994, whisper_loss=0.09968, over 3868206.23 frames. ], batch size: 91, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:47:34,848 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2024-08-10 02:47:37,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=327380.0, ans=0.1 2024-08-10 02:48:17,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=327680.0, ans=0.125 2024-08-10 02:48:22,500 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 02:48:27,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=327680.0, ans=0.125 2024-08-10 02:48:37,307 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 02:48:42,498 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3800, loss[loss=0.1239, beats_loss=0.01083, ecapa_loss=0.0003081, whisper_loss=0.11, over 21057.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01247, ecapa_loss=0.0002991, whisper_loss=0.09944, over 3853541.00 frames. ], batch size: 84, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:48:44,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=327880.0, ans=0.0 2024-08-10 02:48:53,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=327880.0, ans=0.025 2024-08-10 02:48:53,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=327880.0, ans=0.125 2024-08-10 02:48:58,845 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.560e+01 3.072e+01 3.520e+01 3.991e+01 6.360e+01, threshold=7.040e+01, percent-clipped=0.0 2024-08-10 02:49:01,053 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.20 vs. limit=15.0 2024-08-10 02:49:09,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=328080.0, ans=0.0 2024-08-10 02:49:09,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=328080.0, ans=0.0 2024-08-10 02:49:29,734 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 17 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 02:49:31,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=328180.0, ans=0.0 2024-08-10 02:49:37,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=328280.0, ans=0.0 2024-08-10 02:49:42,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=328280.0, ans=0.0 2024-08-10 02:49:43,010 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.81 vs. limit=15.0 2024-08-10 02:49:47,322 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.14 vs. limit=6.0 2024-08-10 02:49:51,925 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3850, loss[loss=0.1196, beats_loss=0.01204, ecapa_loss=0.0002938, whisper_loss=0.1046, over 19399.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01253, ecapa_loss=0.0002973, whisper_loss=0.0993, over 3871088.74 frames. ], batch size: 78, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:50:03,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=328380.0, ans=0.0 2024-08-10 02:50:15,004 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.90 vs. limit=15.0 2024-08-10 02:50:18,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=328580.0, ans=0.0 2024-08-10 02:50:23,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=328580.0, ans=0.0 2024-08-10 02:50:26,091 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-10 02:50:59,247 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3900, loss[loss=0.1302, beats_loss=0.01194, ecapa_loss=0.0003215, whisper_loss=0.1151, over 22339.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01252, ecapa_loss=0.0002989, whisper_loss=0.1005, over 3872357.59 frames. ], batch size: 91, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:51:05,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=328880.0, ans=0.2 2024-08-10 02:51:06,325 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 02:51:15,666 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 3.142e+01 3.558e+01 4.007e+01 5.949e+01, threshold=7.115e+01, percent-clipped=0.0 2024-08-10 02:51:16,254 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=4.745e-01 2024-08-10 02:51:17,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=328980.0, ans=0.1 2024-08-10 02:51:25,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=329080.0, ans=0.125 2024-08-10 02:51:27,905 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 02:52:06,837 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 3950, loss[loss=0.1031, beats_loss=0.01518, ecapa_loss=0.0002366, whisper_loss=0.08558, over 17646.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01249, ecapa_loss=0.0003008, whisper_loss=0.1008, over 3905374.89 frames. ], batch size: 71, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:52:12,933 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2024-08-10 02:52:28,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=329480.0, ans=0.09899494936611666 2024-08-10 02:52:38,354 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2024-08-10 02:52:47,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=329680.0, ans=10.0 2024-08-10 02:52:52,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=329680.0, ans=0.125 2024-08-10 02:53:10,149 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 02:53:13,830 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4000, loss[loss=0.1133, beats_loss=0.01367, ecapa_loss=0.0002549, whisper_loss=0.09703, over 20964.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01252, ecapa_loss=0.0002971, whisper_loss=0.1006, over 3901402.04 frames. ], batch size: 83, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:53:16,716 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 02:53:26,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=329980.0, ans=0.2 2024-08-10 02:53:30,327 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.231e+01 3.066e+01 3.430e+01 3.923e+01 5.367e+01, threshold=6.859e+01, percent-clipped=0.0 2024-08-10 02:53:32,856 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.90 vs. limit=15.0 2024-08-10 02:53:33,303 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 20 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 02:53:35,893 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 02:53:44,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=330080.0, ans=15.0 2024-08-10 02:53:46,559 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 02:53:52,420 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 02:54:15,498 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-10 02:54:21,819 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4050, loss[loss=0.1291, beats_loss=0.01203, ecapa_loss=0.0002781, whisper_loss=0.1143, over 22991.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01244, ecapa_loss=0.0002966, whisper_loss=0.101, over 3897918.03 frames. ], batch size: 90, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:54:23,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=330380.0, ans=0.1 2024-08-10 02:54:34,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=330480.0, ans=0.125 2024-08-10 02:54:36,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=330480.0, ans=0.125 2024-08-10 02:54:42,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330480.0, ans=0.1 2024-08-10 02:54:46,216 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 02:54:47,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=330580.0, ans=0.125 2024-08-10 02:54:59,086 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2024-08-10 02:55:00,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=330580.0, ans=0.125 2024-08-10 02:55:18,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=330780.0, ans=0.07 2024-08-10 02:55:28,892 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4100, loss[loss=0.09977, beats_loss=0.01033, ecapa_loss=0.0003491, whisper_loss=0.08594, over 18591.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01245, ecapa_loss=0.0002983, whisper_loss=0.09979, over 3903582.15 frames. ], batch size: 72, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:55:39,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=330880.0, ans=0.0 2024-08-10 02:55:43,912 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 02:55:44,892 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.505e+01 2.941e+01 3.171e+01 3.928e+01 6.026e+01, threshold=6.343e+01, percent-clipped=0.0 2024-08-10 02:56:03,268 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2024-08-10 02:56:30,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=331280.0, ans=0.125 2024-08-10 02:56:35,910 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4150, loss[loss=0.1013, beats_loss=0.01257, ecapa_loss=0.0002953, whisper_loss=0.08575, over 21129.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.0124, ecapa_loss=0.0002998, whisper_loss=0.1, over 3865825.72 frames. ], batch size: 87, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:56:44,122 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 02:56:45,470 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 02:56:53,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=331480.0, ans=0.0 2024-08-10 02:57:05,059 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=5.02 vs. limit=15.0 2024-08-10 02:57:07,575 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=12.0 2024-08-10 02:57:33,345 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 02:57:33,792 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2024-08-10 02:57:35,978 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 02:57:42,308 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4200, loss[loss=0.1233, beats_loss=0.0121, ecapa_loss=0.00028, whisper_loss=0.1084, over 14915.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01236, ecapa_loss=0.0002984, whisper_loss=0.1003, over 3855364.51 frames. ], batch size: 57, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:57:47,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=331880.0, ans=0.1 2024-08-10 02:57:49,978 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2024-08-10 02:57:58,800 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.978e+01 3.511e+01 4.145e+01 7.481e+01, threshold=7.022e+01, percent-clipped=3.0 2024-08-10 02:57:59,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=331980.0, ans=0.125 2024-08-10 02:58:11,733 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.962e+00 2024-08-10 02:58:12,718 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-10 02:58:18,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=332080.0, ans=0.0 2024-08-10 02:58:19,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=332080.0, ans=0.2 2024-08-10 02:58:22,046 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 34 from Vox, 26 fro AS 2024-08-10 02:58:22,637 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2024-08-10 02:58:23,998 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=15.0 2024-08-10 02:58:28,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=332180.0, ans=0.1 2024-08-10 02:58:29,638 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-10 02:58:40,261 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 02:58:46,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=332280.0, ans=0.1 2024-08-10 02:58:50,537 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4250, loss[loss=0.09968, beats_loss=0.01468, ecapa_loss=0.0002311, whisper_loss=0.08269, over 22701.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01237, ecapa_loss=0.0002971, whisper_loss=0.09934, over 3865288.90 frames. ], batch size: 91, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:58:50,711 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 02:59:16,046 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 02:59:29,463 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 02:59:32,349 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 8 from Vox, 33 fro AS 2024-08-10 02:59:33,766 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 02:59:42,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=332680.0, ans=0.1 2024-08-10 02:59:43,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=332680.0, ans=0.125 2024-08-10 02:59:51,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=332780.0, ans=0.07 2024-08-10 02:59:54,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=332780.0, ans=0.0 2024-08-10 02:59:59,444 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4300, loss[loss=0.09735, beats_loss=0.01536, ecapa_loss=0.0003447, whisper_loss=0.07854, over 14378.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01232, ecapa_loss=0.000294, whisper_loss=0.1002, over 3864932.06 frames. ], batch size: 59, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 03:00:06,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=332880.0, ans=0.125 2024-08-10 03:00:15,530 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.965e+01 3.329e+01 3.837e+01 6.258e+01, threshold=6.658e+01, percent-clipped=0.0 2024-08-10 03:00:22,654 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 15 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 03:00:56,102 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 03:01:00,197 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-10 03:01:06,700 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4350, loss[loss=0.139, beats_loss=0.008433, ecapa_loss=0.0003701, whisper_loss=0.1269, over 18639.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01228, ecapa_loss=0.000297, whisper_loss=0.09959, over 3849115.47 frames. ], batch size: 73, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:01:07,570 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.66 vs. limit=15.0 2024-08-10 03:01:12,620 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.36 vs. limit=22.5 2024-08-10 03:01:18,277 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.27 vs. limit=12.0 2024-08-10 03:01:33,680 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 03:01:33,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=333580.0, ans=0.125 2024-08-10 03:01:37,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=333580.0, ans=0.0 2024-08-10 03:01:38,750 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.55 vs. limit=15.0 2024-08-10 03:01:52,962 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 03:02:03,532 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-10 03:02:03,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=333780.0, ans=0.04949747468305833 2024-08-10 03:02:13,917 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4400, loss[loss=0.1329, beats_loss=0.008652, ecapa_loss=0.0003145, whisper_loss=0.1212, over 19806.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01228, ecapa_loss=0.0002958, whisper_loss=0.09977, over 3824359.16 frames. ], batch size: 78, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:02:30,430 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+01 2.919e+01 3.248e+01 3.746e+01 6.587e+01, threshold=6.497e+01, percent-clipped=0.0 2024-08-10 03:02:34,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=333980.0, ans=0.0 2024-08-10 03:02:41,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=334080.0, ans=6.0 2024-08-10 03:03:00,074 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 03:03:12,702 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 03:03:14,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=334280.0, ans=0.125 2024-08-10 03:03:22,139 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4450, loss[loss=0.1129, beats_loss=0.01188, ecapa_loss=0.0002445, whisper_loss=0.09856, over 16388.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01237, ecapa_loss=0.0002946, whisper_loss=0.09977, over 3857510.31 frames. ], batch size: 62, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:03:37,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=334480.0, ans=0.2 2024-08-10 03:03:49,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=334480.0, ans=0.125 2024-08-10 03:04:26,806 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 03:04:32,553 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 03:04:35,090 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4500, loss[loss=0.09783, beats_loss=0.01048, ecapa_loss=0.0003285, whisper_loss=0.08406, over 14838.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01233, ecapa_loss=0.0002942, whisper_loss=0.09976, over 3850136.38 frames. ], batch size: 60, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:04:47,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=334880.0, ans=0.125 2024-08-10 03:04:52,225 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.347e+01 3.021e+01 3.497e+01 4.022e+01 7.846e+01, threshold=6.995e+01, percent-clipped=4.0 2024-08-10 03:05:11,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=335080.0, ans=0.0 2024-08-10 03:05:15,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=335080.0, ans=0.2 2024-08-10 03:05:26,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=335180.0, ans=0.0 2024-08-10 03:05:46,516 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4550, loss[loss=0.1005, beats_loss=0.0161, ecapa_loss=0.0002401, whisper_loss=0.082, over 19515.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01236, ecapa_loss=0.0002936, whisper_loss=0.09974, over 3874669.08 frames. ], batch size: 78, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:05:46,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=335380.0, ans=0.2 2024-08-10 03:05:50,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=335380.0, ans=0.025 2024-08-10 03:05:50,338 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.24 vs. limit=22.5 2024-08-10 03:06:34,671 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 03:06:48,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=335780.0, ans=0.1 2024-08-10 03:06:58,490 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4600, loss[loss=0.1177, beats_loss=0.01353, ecapa_loss=0.0003234, whisper_loss=0.101, over 19046.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01245, ecapa_loss=0.0002947, whisper_loss=0.09876, over 3865511.71 frames. ], batch size: 81, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:06:59,349 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.12 vs. limit=6.0 2024-08-10 03:07:05,666 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 19 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-10 03:07:07,326 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 03:07:15,444 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+01 3.246e+01 3.685e+01 4.349e+01 7.107e+01, threshold=7.370e+01, percent-clipped=1.0 2024-08-10 03:08:00,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=336280.0, ans=0.125 2024-08-10 03:08:06,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=336280.0, ans=0.125 2024-08-10 03:08:10,310 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4650, loss[loss=0.1245, beats_loss=0.01133, ecapa_loss=0.000361, whisper_loss=0.1096, over 22763.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01241, ecapa_loss=0.0002978, whisper_loss=0.09897, over 3861496.63 frames. ], batch size: 94, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:08:12,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=336380.0, ans=0.1 2024-08-10 03:08:33,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=336480.0, ans=0.0 2024-08-10 03:08:42,570 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2024-08-10 03:08:47,927 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 15 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-10 03:08:57,106 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=15.0 2024-08-10 03:09:03,318 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 03:09:23,085 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4700, loss[loss=0.1269, beats_loss=0.008734, ecapa_loss=0.0003365, whisper_loss=0.1148, over 16197.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01241, ecapa_loss=0.0002986, whisper_loss=0.09957, over 3869135.50 frames. ], batch size: 65, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:09:31,901 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 32 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 03:09:38,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=336980.0, ans=0.0 2024-08-10 03:09:40,387 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 3.034e+01 3.372e+01 4.313e+01 2.367e+02, threshold=6.744e+01, percent-clipped=2.0 2024-08-10 03:09:41,518 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.09 vs. limit=15.0 2024-08-10 03:09:45,544 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=15.0 2024-08-10 03:09:58,912 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 03:10:00,406 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 03:10:01,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=337080.0, ans=0.05 2024-08-10 03:10:04,683 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 03:10:16,287 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 03:10:16,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=337180.0, ans=0.0 2024-08-10 03:10:33,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=337380.0, ans=0.5 2024-08-10 03:10:34,020 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4750, loss[loss=0.1209, beats_loss=0.01024, ecapa_loss=0.0003272, whisper_loss=0.1074, over 21156.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01235, ecapa_loss=0.0002976, whisper_loss=0.1011, over 3906840.65 frames. ], batch size: 84, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:10:36,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=337380.0, ans=0.125 2024-08-10 03:11:14,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=337580.0, ans=0.0 2024-08-10 03:11:21,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=337680.0, ans=0.0 2024-08-10 03:11:27,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=337680.0, ans=0.2 2024-08-10 03:11:31,576 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 03:11:40,016 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-10 03:11:47,153 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4800, loss[loss=0.1211, beats_loss=0.01228, ecapa_loss=0.0003495, whisper_loss=0.1053, over 22190.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01227, ecapa_loss=0.0002981, whisper_loss=0.1012, over 3911910.64 frames. ], batch size: 93, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:11:47,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=337880.0, ans=0.125 2024-08-10 03:11:48,770 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-10 03:11:49,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=337880.0, ans=6.0 2024-08-10 03:11:50,427 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 03:12:03,984 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 3.006e+01 3.324e+01 3.733e+01 5.524e+01, threshold=6.647e+01, percent-clipped=0.0 2024-08-10 03:12:04,661 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-08-10 03:12:10,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=337980.0, ans=0.5 2024-08-10 03:12:12,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337980.0, ans=0.1 2024-08-10 03:12:30,497 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-10 03:12:33,531 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 03:12:39,266 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 03:12:40,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=338180.0, ans=0.125 2024-08-10 03:12:56,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=338280.0, ans=0.125 2024-08-10 03:12:58,621 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4850, loss[loss=0.0926, beats_loss=0.01466, ecapa_loss=0.0002511, whisper_loss=0.07543, over 22303.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01232, ecapa_loss=0.0002964, whisper_loss=0.1002, over 3919773.52 frames. ], batch size: 91, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:12:59,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=338380.0, ans=0.2 2024-08-10 03:13:01,529 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-10 03:13:02,915 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 17 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 03:13:10,626 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.77 vs. limit=22.5 2024-08-10 03:13:25,879 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 03:13:54,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=338780.0, ans=0.0 2024-08-10 03:14:11,017 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4900, loss[loss=0.1309, beats_loss=0.0121, ecapa_loss=0.0002993, whisper_loss=0.1158, over 20794.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01244, ecapa_loss=0.000296, whisper_loss=0.1001, over 3909902.54 frames. ], batch size: 81, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:14:12,679 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 03:14:18,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=338880.0, ans=0.0 2024-08-10 03:14:26,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=338980.0, ans=0.05 2024-08-10 03:14:28,521 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.478e+01 3.105e+01 3.477e+01 3.938e+01 7.192e+01, threshold=6.955e+01, percent-clipped=1.0 2024-08-10 03:14:45,228 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 03:14:48,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=339080.0, ans=0.0 2024-08-10 03:14:58,545 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2024-08-10 03:14:59,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=339180.0, ans=0.0 2024-08-10 03:15:07,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=339280.0, ans=0.2 2024-08-10 03:15:22,641 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 4950, loss[loss=0.1182, beats_loss=0.0139, ecapa_loss=0.0003133, whisper_loss=0.1012, over 22129.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01258, ecapa_loss=0.0002951, whisper_loss=0.09892, over 3890126.02 frames. ], batch size: 89, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:15:24,162 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 27 from Vox, 21 fro AS 2024-08-10 03:15:28,670 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 03:15:28,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=339380.0, ans=0.125 2024-08-10 03:15:42,978 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-10 03:15:53,772 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.17 vs. limit=10.0 2024-08-10 03:15:57,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=339580.0, ans=0.125 2024-08-10 03:16:02,017 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 03:16:04,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=339580.0, ans=0.0 2024-08-10 03:16:19,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=339680.0, ans=0.0 2024-08-10 03:16:21,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=339780.0, ans=0.1 2024-08-10 03:16:25,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=339780.0, ans=0.0 2024-08-10 03:16:26,745 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 03:16:36,283 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5000, loss[loss=0.1024, beats_loss=0.01193, ecapa_loss=0.0002966, whisper_loss=0.08753, over 21501.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01258, ecapa_loss=0.0002948, whisper_loss=0.09875, over 3869723.05 frames. ], batch size: 90, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:16:39,596 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 03:16:42,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=339880.0, ans=22.5 2024-08-10 03:16:53,174 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.921e+01 3.372e+01 3.826e+01 7.563e+01, threshold=6.744e+01, percent-clipped=1.0 2024-08-10 03:17:15,779 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-10 03:17:16,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=340080.0, ans=0.0 2024-08-10 03:17:20,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=340180.0, ans=0.0 2024-08-10 03:17:33,074 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 21 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-10 03:17:34,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=340280.0, ans=0.0 2024-08-10 03:17:41,455 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 03:17:48,498 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5050, loss[loss=0.1109, beats_loss=0.01205, ecapa_loss=0.00028, whisper_loss=0.09608, over 17266.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01258, ecapa_loss=0.000296, whisper_loss=0.09898, over 3889190.76 frames. ], batch size: 68, lr: 1.95e-02, grad_scale: 4194304.0 2024-08-10 03:17:50,151 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 03:18:15,038 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-10 03:18:15,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=340480.0, ans=0.2 2024-08-10 03:18:18,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=340580.0, ans=0.0 2024-08-10 03:18:36,342 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.22 vs. limit=10.0 2024-08-10 03:18:38,944 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-10 03:18:40,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=340680.0, ans=0.2 2024-08-10 03:18:41,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=340680.0, ans=0.125 2024-08-10 03:19:01,791 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5100, loss[loss=0.1159, beats_loss=0.01217, ecapa_loss=0.0002675, whisper_loss=0.1011, over 20017.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01269, ecapa_loss=0.0002937, whisper_loss=0.09791, over 3898396.83 frames. ], batch size: 78, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:19:19,756 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.388e+01 2.974e+01 3.405e+01 3.841e+01 8.729e+01, threshold=6.810e+01, percent-clipped=2.0 2024-08-10 03:19:23,246 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-10 03:19:27,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=340980.0, ans=0.1 2024-08-10 03:19:33,778 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 03:19:48,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=341180.0, ans=0.0 2024-08-10 03:19:57,588 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.91 vs. limit=22.5 2024-08-10 03:20:14,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=341280.0, ans=0.1 2024-08-10 03:20:17,341 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5150, loss[loss=0.1264, beats_loss=0.00981, ecapa_loss=0.0002935, whisper_loss=0.1137, over 16541.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01263, ecapa_loss=0.0002925, whisper_loss=0.09822, over 3885538.95 frames. ], batch size: 64, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:20:31,340 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 03:20:37,649 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 03:20:57,700 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2024-08-10 03:21:18,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=341780.0, ans=0.2 2024-08-10 03:21:30,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=341780.0, ans=0.0 2024-08-10 03:21:32,996 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5200, loss[loss=0.1138, beats_loss=0.008112, ecapa_loss=0.0003239, whisper_loss=0.1024, over 15021.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01248, ecapa_loss=0.0002935, whisper_loss=0.09857, over 3887453.57 frames. ], batch size: 56, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:21:33,879 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2024-08-10 03:21:48,617 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 03:21:51,125 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.930e+01 3.270e+01 3.996e+01 6.105e+01, threshold=6.539e+01, percent-clipped=0.0 2024-08-10 03:21:56,407 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-08-10 03:22:10,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=342080.0, ans=0.125 2024-08-10 03:22:13,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=342080.0, ans=0.0 2024-08-10 03:22:25,295 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 19 from LS+wenet, 30 from Vox, 42 fro AS 2024-08-10 03:22:30,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=342180.0, ans=0.125 2024-08-10 03:22:32,345 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-10 03:22:42,610 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 03:22:46,703 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5250, loss[loss=0.1148, beats_loss=0.01189, ecapa_loss=0.0003032, whisper_loss=0.09987, over 19442.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01238, ecapa_loss=0.0002933, whisper_loss=0.09852, over 3847670.82 frames. ], batch size: 75, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:22:51,747 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 03:23:03,156 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.85 vs. limit=5.0 2024-08-10 03:23:10,111 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.93 vs. limit=10.0 2024-08-10 03:23:22,908 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.474e+01 2024-08-10 03:23:26,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=342580.0, ans=0.0 2024-08-10 03:23:40,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=342680.0, ans=0.1 2024-08-10 03:23:40,635 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.97 vs. limit=15.0 2024-08-10 03:23:43,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=342680.0, ans=0.125 2024-08-10 03:23:52,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=342780.0, ans=0.1 2024-08-10 03:24:02,519 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5300, loss[loss=0.1338, beats_loss=0.01288, ecapa_loss=0.0002358, whisper_loss=0.1186, over 22736.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01235, ecapa_loss=0.0002946, whisper_loss=0.09894, over 3863751.95 frames. ], batch size: 88, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:24:07,271 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 03:24:07,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.87 vs. limit=15.0 2024-08-10 03:24:13,744 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2024-08-10 03:24:17,626 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 03:24:19,532 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.354e+05 2024-08-10 03:24:20,285 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 2.874e+01 3.315e+01 3.923e+01 7.752e+01, threshold=6.630e+01, percent-clipped=2.0 2024-08-10 03:24:23,668 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.99 vs. limit=15.0 2024-08-10 03:24:25,818 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 03:24:36,074 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 03:24:39,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=343080.0, ans=0.1 2024-08-10 03:24:43,366 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-10 03:24:46,124 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 03:25:02,580 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 03:25:05,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=343280.0, ans=10.0 2024-08-10 03:25:07,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=343280.0, ans=0.95 2024-08-10 03:25:07,454 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=12.0 2024-08-10 03:25:10,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=343280.0, ans=0.125 2024-08-10 03:25:13,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=343280.0, ans=0.125 2024-08-10 03:25:15,713 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5350, loss[loss=0.1106, beats_loss=0.01254, ecapa_loss=0.0003504, whisper_loss=0.09457, over 22067.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01234, ecapa_loss=0.0002939, whisper_loss=0.09867, over 3889296.76 frames. ], batch size: 93, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:25:23,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=343380.0, ans=0.1 2024-08-10 03:25:50,044 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 03:25:50,339 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.186e+03 2024-08-10 03:25:50,970 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.22 vs. limit=15.0 2024-08-10 03:25:53,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=343580.0, ans=0.02 2024-08-10 03:26:00,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=343680.0, ans=0.125 2024-08-10 03:26:06,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=343680.0, ans=0.1 2024-08-10 03:26:21,822 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 27 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-10 03:26:26,672 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 03:26:30,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=343780.0, ans=0.0 2024-08-10 03:26:32,427 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5400, loss[loss=0.1301, beats_loss=0.0117, ecapa_loss=0.0002802, whisper_loss=0.1156, over 22993.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01234, ecapa_loss=0.0002926, whisper_loss=0.09921, over 3902697.40 frames. ], batch size: 92, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:26:32,817 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 13 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-10 03:26:50,395 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.952e+01 3.404e+01 3.987e+01 5.856e+01, threshold=6.808e+01, percent-clipped=0.0 2024-08-10 03:26:56,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=343980.0, ans=0.125 2024-08-10 03:27:00,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=343980.0, ans=0.125 2024-08-10 03:27:05,457 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 03:27:05,975 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2024-08-10 03:27:15,912 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 03:27:16,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=344180.0, ans=0.1 2024-08-10 03:27:28,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=344180.0, ans=0.0 2024-08-10 03:27:45,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=344380.0, ans=0.125 2024-08-10 03:27:46,554 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5450, loss[loss=0.1274, beats_loss=0.01186, ecapa_loss=0.0002305, whisper_loss=0.1132, over 15207.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01235, ecapa_loss=0.0002924, whisper_loss=0.09928, over 3876462.08 frames. ], batch size: 56, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:28:01,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=344480.0, ans=0.125 2024-08-10 03:28:05,308 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 03:28:19,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=344580.0, ans=0.1 2024-08-10 03:28:25,514 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 03:28:31,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=344680.0, ans=0.0 2024-08-10 03:28:47,313 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 37 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 03:29:03,759 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5500, loss[loss=0.0879, beats_loss=0.01295, ecapa_loss=0.0002713, whisper_loss=0.07223, over 17670.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01234, ecapa_loss=0.0002932, whisper_loss=0.09884, over 3849603.25 frames. ], batch size: 69, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:29:06,078 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 03:29:16,713 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 03:29:20,243 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2024-08-10 03:29:22,160 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.947e+01 3.297e+01 3.879e+01 5.625e+01, threshold=6.594e+01, percent-clipped=0.0 2024-08-10 03:29:25,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=344980.0, ans=0.1 2024-08-10 03:29:45,429 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2024-08-10 03:29:46,429 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 03:29:57,555 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.56 vs. limit=22.5 2024-08-10 03:30:19,050 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5550, loss[loss=0.0916, beats_loss=0.01194, ecapa_loss=0.0003197, whisper_loss=0.07646, over 17211.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01237, ecapa_loss=0.0002935, whisper_loss=0.09889, over 3882510.70 frames. ], batch size: 67, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:30:19,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=345380.0, ans=0.1 2024-08-10 03:30:26,661 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 29 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 03:30:29,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=345380.0, ans=0.0 2024-08-10 03:30:46,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=345480.0, ans=0.1 2024-08-10 03:30:51,923 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.53 vs. limit=15.0 2024-08-10 03:30:52,668 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 03:30:54,204 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 03:31:33,129 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 03:31:35,571 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5600, loss[loss=0.1143, beats_loss=0.01419, ecapa_loss=0.0002925, whisper_loss=0.09715, over 22471.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01235, ecapa_loss=0.000293, whisper_loss=0.09901, over 3916155.76 frames. ], batch size: 93, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:31:38,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=345880.0, ans=0.0 2024-08-10 03:31:52,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=345980.0, ans=0.0 2024-08-10 03:31:53,503 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.950e+01 3.327e+01 3.865e+01 5.194e+01, threshold=6.655e+01, percent-clipped=0.0 2024-08-10 03:32:08,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=346080.0, ans=0.125 2024-08-10 03:32:32,201 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.54 vs. limit=10.0 2024-08-10 03:32:33,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=346280.0, ans=0.1 2024-08-10 03:32:36,572 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 03:32:49,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5650, loss[loss=0.09508, beats_loss=0.01356, ecapa_loss=0.0003025, whisper_loss=0.07849, over 19404.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01234, ecapa_loss=0.0002944, whisper_loss=0.09844, over 3925439.64 frames. ], batch size: 82, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:33:29,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=346580.0, ans=0.125 2024-08-10 03:33:36,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=346680.0, ans=0.0 2024-08-10 03:34:04,331 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 03:34:04,911 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.27 vs. limit=15.0 2024-08-10 03:34:05,477 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5700, loss[loss=0.1189, beats_loss=0.0108, ecapa_loss=0.0003076, whisper_loss=0.1051, over 21231.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01238, ecapa_loss=0.0002931, whisper_loss=0.09874, over 3929679.33 frames. ], batch size: 88, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:34:07,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=346880.0, ans=0.125 2024-08-10 03:34:23,397 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.410e+01 2.923e+01 3.363e+01 4.122e+01 7.176e+01, threshold=6.726e+01, percent-clipped=2.0 2024-08-10 03:34:25,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=346980.0, ans=0.1 2024-08-10 03:34:30,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=346980.0, ans=0.125 2024-08-10 03:34:31,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=346980.0, ans=0.125 2024-08-10 03:34:49,503 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 03:34:59,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=347180.0, ans=0.025 2024-08-10 03:35:02,267 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.07 vs. limit=15.0 2024-08-10 03:35:08,272 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 22 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-10 03:35:08,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=347280.0, ans=0.1 2024-08-10 03:35:19,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=347280.0, ans=0.125 2024-08-10 03:35:22,092 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5750, loss[loss=0.1079, beats_loss=0.01402, ecapa_loss=0.0003019, whisper_loss=0.09085, over 17447.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01235, ecapa_loss=0.0002918, whisper_loss=0.09926, over 3926368.50 frames. ], batch size: 72, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:35:51,827 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 03:36:03,758 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.04 vs. limit=15.0 2024-08-10 03:36:09,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=347680.0, ans=0.2 2024-08-10 03:36:12,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=347680.0, ans=0.0 2024-08-10 03:36:20,598 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 03:36:36,090 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5800, loss[loss=0.1416, beats_loss=0.009672, ecapa_loss=0.0002925, whisper_loss=0.129, over 23426.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01244, ecapa_loss=0.0002942, whisper_loss=0.09857, over 3916768.60 frames. ], batch size: 91, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:36:41,426 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 03:36:52,032 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 03:36:54,526 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.337e+01 2.824e+01 3.290e+01 3.735e+01 8.555e+01, threshold=6.581e+01, percent-clipped=2.0 2024-08-10 03:36:58,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=347980.0, ans=0.125 2024-08-10 03:37:08,213 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 03:37:50,797 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5850, loss[loss=0.1222, beats_loss=0.01015, ecapa_loss=0.0002342, whisper_loss=0.1097, over 17433.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01245, ecapa_loss=0.0002949, whisper_loss=0.0988, over 3917357.24 frames. ], batch size: 64, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:37:59,629 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 03:37:59,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=348380.0, ans=0.125 2024-08-10 03:38:13,434 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-10 03:38:30,629 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 03:38:33,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=348680.0, ans=0.2 2024-08-10 03:39:00,762 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5900, loss[loss=0.1218, beats_loss=0.01381, ecapa_loss=0.0002506, whisper_loss=0.1054, over 23356.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01246, ecapa_loss=0.0002933, whisper_loss=0.09876, over 3930998.03 frames. ], batch size: 95, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:39:02,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=348880.0, ans=0.125 2024-08-10 03:39:14,586 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.88 vs. limit=10.0 2024-08-10 03:39:16,454 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.205e+01 2.929e+01 3.311e+01 3.794e+01 5.610e+01, threshold=6.621e+01, percent-clipped=0.0 2024-08-10 03:39:33,152 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 03:39:37,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=349080.0, ans=0.125 2024-08-10 03:39:57,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=349280.0, ans=0.125 2024-08-10 03:40:09,358 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 5950, loss[loss=0.1127, beats_loss=0.01126, ecapa_loss=0.0003421, whisper_loss=0.09799, over 18774.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01241, ecapa_loss=0.0002929, whisper_loss=0.09901, over 3936008.65 frames. ], batch size: 78, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:40:10,062 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.41 vs. limit=10.0 2024-08-10 03:40:20,453 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 03:40:32,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=349480.0, ans=0.125 2024-08-10 03:40:36,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=349580.0, ans=0.0 2024-08-10 03:40:39,254 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 03:40:44,975 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-10 03:40:49,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=349680.0, ans=0.125 2024-08-10 03:40:55,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=349680.0, ans=0.0 2024-08-10 03:40:58,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=349680.0, ans=0.125 2024-08-10 03:41:12,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=349780.0, ans=0.125 2024-08-10 03:41:14,850 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 03:41:18,606 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6000, loss[loss=0.1075, beats_loss=0.01497, ecapa_loss=0.0002683, whisper_loss=0.08989, over 18234.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01229, ecapa_loss=0.0002932, whisper_loss=0.1, over 3924405.35 frames. ], batch size: 75, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:41:18,607 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-10 03:41:57,738 INFO [train_multi_KD3.py:1149] (2/4) Epoch 3, validation on ASR_libri: loss=0.2761, beats_loss=0, ecapa_loss=0.0008742, whisper_loss=0.2674, over 922467.00 frames. 2024-08-10 03:42:15,745 INFO [train_multi_KD3.py:1149] (2/4) Epoch 3, validation on SV_voxceleb1: loss=0.007667, beats_loss=0, ecapa_loss=0.0007667, whisper_loss=0, over 939242.00 frames. 2024-08-10 03:44:14,964 INFO [train_multi_KD3.py:1149] (2/4) Epoch 3, validation on AT_audioset: loss=0.0285, beats_loss=0.0285, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 03:44:14,968 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 03:44:28,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=349980.0, ans=0.0 2024-08-10 03:44:32,124 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 3.043e+01 3.498e+01 4.267e+01 5.483e+01, threshold=6.996e+01, percent-clipped=0.0 2024-08-10 03:44:34,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=349980.0, ans=0.1 2024-08-10 03:44:37,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=349980.0, ans=0.0 2024-08-10 03:44:46,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=350080.0, ans=0.125 2024-08-10 03:44:48,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=350080.0, ans=0.0 2024-08-10 03:44:56,469 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 25 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-10 03:44:58,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=350180.0, ans=0.2 2024-08-10 03:45:03,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=350180.0, ans=0.0 2024-08-10 03:45:06,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=350180.0, ans=0.025 2024-08-10 03:45:18,309 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 03:45:18,922 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.19 vs. limit=15.0 2024-08-10 03:45:20,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=350280.0, ans=0.0 2024-08-10 03:45:25,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=350380.0, ans=0.04949747468305833 2024-08-10 03:45:26,539 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6050, loss[loss=0.122, beats_loss=0.01167, ecapa_loss=0.0002506, whisper_loss=0.1078, over 14375.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01219, ecapa_loss=0.000291, whisper_loss=0.1, over 3878496.33 frames. ], batch size: 55, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:45:46,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=350480.0, ans=0.125 2024-08-10 03:45:51,493 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=15.0 2024-08-10 03:45:56,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=350580.0, ans=0.125 2024-08-10 03:46:12,015 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 03:46:36,532 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6100, loss[loss=0.1128, beats_loss=0.01003, ecapa_loss=0.000273, whisper_loss=0.1, over 17829.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01221, ecapa_loss=0.0002894, whisper_loss=0.09999, over 3906133.24 frames. ], batch size: 69, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:46:45,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=350880.0, ans=0.125 2024-08-10 03:46:53,156 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.971e+01 3.424e+01 4.102e+01 1.085e+02, threshold=6.848e+01, percent-clipped=1.0 2024-08-10 03:46:57,785 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 03:47:04,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=351080.0, ans=0.125 2024-08-10 03:47:22,647 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 03:47:43,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=351280.0, ans=0.0 2024-08-10 03:47:45,799 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6150, loss[loss=0.1132, beats_loss=0.01159, ecapa_loss=0.0002581, whisper_loss=0.09902, over 17811.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01235, ecapa_loss=0.0002894, whisper_loss=0.09925, over 3915766.16 frames. ], batch size: 69, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:47:46,041 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-10 03:48:25,196 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 03:48:25,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=351580.0, ans=0.125 2024-08-10 03:48:29,095 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 03:48:30,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=351680.0, ans=0.125 2024-08-10 03:48:49,463 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 03:48:54,929 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6200, loss[loss=0.1144, beats_loss=0.0133, ecapa_loss=0.0002606, whisper_loss=0.09845, over 23661.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01233, ecapa_loss=0.0002887, whisper_loss=0.09947, over 3913345.56 frames. ], batch size: 91, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:48:56,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=351880.0, ans=0.0 2024-08-10 03:48:59,090 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 03:48:59,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=351880.0, ans=0.125 2024-08-10 03:48:59,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=351880.0, ans=0.1 2024-08-10 03:48:59,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=351880.0, ans=0.125 2024-08-10 03:49:02,014 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 13 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 03:49:04,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=351880.0, ans=0.2 2024-08-10 03:49:11,265 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.480e+01 3.031e+01 3.409e+01 3.924e+01 5.999e+01, threshold=6.819e+01, percent-clipped=0.0 2024-08-10 03:49:15,565 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 03:49:15,943 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.711e+00 2024-08-10 03:49:18,375 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 12 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-10 03:49:32,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=352080.0, ans=0.95 2024-08-10 03:49:33,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=352080.0, ans=0.125 2024-08-10 03:49:36,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=352180.0, ans=0.125 2024-08-10 03:49:48,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=352280.0, ans=0.0 2024-08-10 03:49:54,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=352280.0, ans=0.0 2024-08-10 03:50:01,377 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.29 vs. limit=15.0 2024-08-10 03:50:02,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6250, loss[loss=0.1055, beats_loss=0.0137, ecapa_loss=0.0003016, whisper_loss=0.08883, over 17850.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01237, ecapa_loss=0.0002899, whisper_loss=0.09882, over 3872615.01 frames. ], batch size: 74, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:50:43,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=352680.0, ans=0.0 2024-08-10 03:50:55,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=352680.0, ans=0.5 2024-08-10 03:50:57,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=352780.0, ans=0.125 2024-08-10 03:50:59,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=352780.0, ans=0.2 2024-08-10 03:51:01,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=352780.0, ans=0.125 2024-08-10 03:51:03,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=352780.0, ans=0.05 2024-08-10 03:51:09,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=352880.0, ans=0.2 2024-08-10 03:51:10,707 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6300, loss[loss=0.134, beats_loss=0.01223, ecapa_loss=0.0003301, whisper_loss=0.1185, over 18139.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01239, ecapa_loss=0.000293, whisper_loss=0.09901, over 3895444.87 frames. ], batch size: 70, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:51:25,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=352980.0, ans=0.125 2024-08-10 03:51:27,319 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.289e+01 3.057e+01 3.444e+01 4.179e+01 1.718e+02, threshold=6.888e+01, percent-clipped=1.0 2024-08-10 03:51:37,125 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 03:51:58,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=353180.0, ans=0.5 2024-08-10 03:52:10,321 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 03:52:19,400 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6350, loss[loss=0.1145, beats_loss=0.01262, ecapa_loss=0.0002566, whisper_loss=0.09933, over 20656.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01238, ecapa_loss=0.0002939, whisper_loss=0.09865, over 3862884.23 frames. ], batch size: 83, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:52:24,881 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.53 vs. limit=5.0 2024-08-10 03:52:27,980 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 03:52:54,289 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 03:52:54,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=353580.0, ans=0.0 2024-08-10 03:53:09,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=353680.0, ans=0.2 2024-08-10 03:53:22,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=353780.0, ans=0.125 2024-08-10 03:53:28,573 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6400, loss[loss=0.1238, beats_loss=0.01016, ecapa_loss=0.0003473, whisper_loss=0.1102, over 15726.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01234, ecapa_loss=0.0002935, whisper_loss=0.09957, over 3883084.73 frames. ], batch size: 62, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:53:31,314 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 03:53:32,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=353880.0, ans=0.1 2024-08-10 03:53:34,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=353880.0, ans=0.0 2024-08-10 03:53:44,876 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.873e+01 3.233e+01 3.602e+01 5.742e+01, threshold=6.465e+01, percent-clipped=0.0 2024-08-10 03:53:51,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=353980.0, ans=0.2 2024-08-10 03:53:56,126 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 29 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-10 03:54:00,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=354080.0, ans=0.2 2024-08-10 03:54:03,866 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.45 vs. limit=10.0 2024-08-10 03:54:10,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=354180.0, ans=0.0 2024-08-10 03:54:16,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=354180.0, ans=0.125 2024-08-10 03:54:24,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=354280.0, ans=0.125 2024-08-10 03:54:24,847 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 03:54:25,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=354280.0, ans=0.025 2024-08-10 03:54:37,134 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6450, loss[loss=0.1287, beats_loss=0.01141, ecapa_loss=0.0003019, whisper_loss=0.1142, over 20446.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01235, ecapa_loss=0.0002927, whisper_loss=0.09991, over 3930050.93 frames. ], batch size: 83, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:54:50,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=354480.0, ans=0.1 2024-08-10 03:55:09,011 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2024-08-10 03:55:16,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=354580.0, ans=0.125 2024-08-10 03:55:24,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=354680.0, ans=0.0 2024-08-10 03:55:45,995 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6500, loss[loss=0.1126, beats_loss=0.01422, ecapa_loss=0.0002525, whisper_loss=0.09585, over 22601.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01232, ecapa_loss=0.0002923, whisper_loss=0.1001, over 3908686.88 frames. ], batch size: 92, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:55:56,861 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 03:56:02,004 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 2.914e+01 3.314e+01 3.758e+01 6.768e+01, threshold=6.629e+01, percent-clipped=1.0 2024-08-10 03:56:04,804 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 03:56:09,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=354980.0, ans=0.0 2024-08-10 03:56:11,003 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.69 vs. limit=15.0 2024-08-10 03:56:13,065 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 03:56:15,828 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 03:56:22,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=355080.0, ans=0.0 2024-08-10 03:56:28,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=355180.0, ans=0.1 2024-08-10 03:56:40,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=355280.0, ans=6.0 2024-08-10 03:56:41,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=355280.0, ans=0.125 2024-08-10 03:56:50,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=355280.0, ans=0.125 2024-08-10 03:56:53,837 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6550, loss[loss=0.1184, beats_loss=0.01392, ecapa_loss=0.0003, whisper_loss=0.1015, over 16664.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01235, ecapa_loss=0.000293, whisper_loss=0.1004, over 3914565.45 frames. ], batch size: 68, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:56:55,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=355380.0, ans=0.125 2024-08-10 03:56:58,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=355380.0, ans=0.1 2024-08-10 03:56:59,104 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 03:57:10,232 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 03:57:18,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=355480.0, ans=0.0 2024-08-10 03:58:01,562 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6600, loss[loss=0.1137, beats_loss=0.01092, ecapa_loss=0.0003263, whisper_loss=0.09955, over 21374.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01236, ecapa_loss=0.0002949, whisper_loss=0.1003, over 3917621.01 frames. ], batch size: 89, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 03:58:08,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=355880.0, ans=0.125 2024-08-10 03:58:18,201 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.283e+01 3.136e+01 3.510e+01 4.053e+01 6.821e+01, threshold=7.019e+01, percent-clipped=1.0 2024-08-10 03:58:25,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=355980.0, ans=0.125 2024-08-10 03:58:25,674 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2024-08-10 03:58:37,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=356080.0, ans=0.125 2024-08-10 03:58:45,660 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 18 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-10 03:58:50,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=356180.0, ans=0.125 2024-08-10 03:58:53,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=356180.0, ans=0.0 2024-08-10 03:59:00,995 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 31 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 03:59:02,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=356280.0, ans=0.0 2024-08-10 03:59:04,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=356280.0, ans=0.2 2024-08-10 03:59:08,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=356280.0, ans=0.125 2024-08-10 03:59:10,144 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6650, loss[loss=0.1096, beats_loss=0.01286, ecapa_loss=0.0003439, whisper_loss=0.09335, over 21917.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01235, ecapa_loss=0.000295, whisper_loss=0.1003, over 3893665.50 frames. ], batch size: 92, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 03:59:14,877 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 03:59:20,739 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 03:59:49,474 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2024-08-10 04:00:04,026 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 21 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-10 04:00:10,247 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-10 04:00:11,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=356780.0, ans=0.1 2024-08-10 04:00:19,740 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6700, loss[loss=0.1311, beats_loss=0.01147, ecapa_loss=0.00038, whisper_loss=0.1158, over 21674.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01242, ecapa_loss=0.0002936, whisper_loss=0.09985, over 3905158.36 frames. ], batch size: 94, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:00:26,214 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2024-08-10 04:00:35,947 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.886e+01 3.265e+01 3.693e+01 7.385e+01, threshold=6.529e+01, percent-clipped=1.0 2024-08-10 04:00:38,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=356980.0, ans=0.125 2024-08-10 04:00:48,717 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 04:01:00,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=357180.0, ans=0.0 2024-08-10 04:01:01,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357180.0, ans=0.1 2024-08-10 04:01:08,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=357180.0, ans=0.1 2024-08-10 04:01:27,506 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 04:01:28,575 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6750, loss[loss=0.1126, beats_loss=0.01393, ecapa_loss=0.0002835, whisper_loss=0.09588, over 18464.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.0125, ecapa_loss=0.0002903, whisper_loss=0.09918, over 3899305.81 frames. ], batch size: 77, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:01:38,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=357380.0, ans=0.0 2024-08-10 04:01:40,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=357380.0, ans=0.2 2024-08-10 04:01:41,781 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=27.61 vs. limit=22.5 2024-08-10 04:01:59,623 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2024-08-10 04:02:02,665 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.98 vs. limit=15.0 2024-08-10 04:02:10,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=357680.0, ans=0.125 2024-08-10 04:02:10,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=357680.0, ans=0.0 2024-08-10 04:02:18,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=357680.0, ans=0.125 2024-08-10 04:02:29,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=357780.0, ans=0.0 2024-08-10 04:02:35,972 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 04:02:37,210 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6800, loss[loss=0.09569, beats_loss=0.0132, ecapa_loss=0.0002616, whisper_loss=0.07987, over 19773.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01244, ecapa_loss=0.0002931, whisper_loss=0.09895, over 3871194.93 frames. ], batch size: 77, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:02:37,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=357880.0, ans=0.125 2024-08-10 04:02:40,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=357880.0, ans=0.125 2024-08-10 04:02:41,442 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-10 04:02:54,010 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.434e+01 2.970e+01 3.321e+01 3.801e+01 1.301e+02, threshold=6.643e+01, percent-clipped=3.0 2024-08-10 04:02:58,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=357980.0, ans=0.0 2024-08-10 04:03:01,653 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.947e+00 2024-08-10 04:03:02,918 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 21 from LS+wenet, 20 from Vox, 53 fro AS 2024-08-10 04:03:10,348 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=15.0 2024-08-10 04:03:29,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=358180.0, ans=0.0 2024-08-10 04:03:32,053 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-10 04:03:34,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=358280.0, ans=0.125 2024-08-10 04:03:46,623 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6850, loss[loss=0.1409, beats_loss=0.01122, ecapa_loss=0.0003503, whisper_loss=0.1262, over 22596.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01251, ecapa_loss=0.0002921, whisper_loss=0.09826, over 3858905.18 frames. ], batch size: 93, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:03:56,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=358380.0, ans=0.2 2024-08-10 04:04:02,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=358480.0, ans=0.125 2024-08-10 04:04:11,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=358480.0, ans=0.0 2024-08-10 04:04:22,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=358580.0, ans=0.125 2024-08-10 04:04:35,692 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.71 vs. limit=15.0 2024-08-10 04:04:43,271 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.88 vs. limit=15.0 2024-08-10 04:04:46,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=358780.0, ans=0.1 2024-08-10 04:04:54,966 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6900, loss[loss=0.1233, beats_loss=0.01236, ecapa_loss=0.0002953, whisper_loss=0.108, over 21920.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01254, ecapa_loss=0.0002904, whisper_loss=0.09801, over 3854339.61 frames. ], batch size: 86, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:05:02,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=358880.0, ans=0.125 2024-08-10 04:05:06,720 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 24 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-10 04:05:10,687 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.982e+01 3.330e+01 3.890e+01 5.660e+01, threshold=6.660e+01, percent-clipped=0.0 2024-08-10 04:06:03,889 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 6950, loss[loss=0.0933, beats_loss=0.01245, ecapa_loss=0.0003357, whisper_loss=0.0775, over 19719.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01251, ecapa_loss=0.0002919, whisper_loss=0.09887, over 3886572.85 frames. ], batch size: 87, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:06:09,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=359380.0, ans=0.0 2024-08-10 04:06:18,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=359480.0, ans=0.0 2024-08-10 04:06:25,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=359480.0, ans=0.0 2024-08-10 04:06:36,195 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 27 from Vox, 20 fro AS 2024-08-10 04:06:38,711 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 24 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-10 04:06:48,533 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 04:06:52,782 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-10 04:06:55,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=359680.0, ans=0.125 2024-08-10 04:06:56,309 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.76 vs. limit=15.0 2024-08-10 04:07:13,281 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7000, loss[loss=0.1317, beats_loss=0.009512, ecapa_loss=0.000357, whisper_loss=0.1186, over 17316.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01239, ecapa_loss=0.0002931, whisper_loss=0.09903, over 3872577.80 frames. ], batch size: 70, lr: 1.89e-02, grad_scale: 4194304.0 2024-08-10 04:07:32,798 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.852e+01 3.263e+01 3.844e+01 5.295e+01, threshold=6.525e+01, percent-clipped=0.0 2024-08-10 04:07:34,626 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 04:08:21,229 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-08-10 04:08:24,245 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7050, loss[loss=0.1234, beats_loss=0.01103, ecapa_loss=0.0003157, whisper_loss=0.1092, over 19754.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01231, ecapa_loss=0.0002927, whisper_loss=0.0995, over 3864585.19 frames. ], batch size: 79, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:08:30,137 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 04:08:39,108 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.57 vs. limit=22.5 2024-08-10 04:08:41,059 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 04:09:11,642 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 04:09:27,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=360780.0, ans=0.1 2024-08-10 04:09:32,608 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7100, loss[loss=0.104, beats_loss=0.01009, ecapa_loss=0.0002966, whisper_loss=0.09091, over 18438.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01233, ecapa_loss=0.0002901, whisper_loss=0.09946, over 3849834.63 frames. ], batch size: 72, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:09:40,899 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 04:09:44,248 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.43 vs. limit=22.5 2024-08-10 04:09:48,986 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.283e+01 3.064e+01 3.569e+01 4.090e+01 1.167e+02, threshold=7.137e+01, percent-clipped=2.0 2024-08-10 04:09:51,377 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=12.0 2024-08-10 04:10:12,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=361080.0, ans=0.125 2024-08-10 04:10:30,163 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.65 vs. limit=15.0 2024-08-10 04:10:41,348 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7150, loss[loss=0.11, beats_loss=0.0118, ecapa_loss=0.0002872, whisper_loss=0.09535, over 20149.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01236, ecapa_loss=0.0002888, whisper_loss=0.09974, over 3875868.78 frames. ], batch size: 79, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:10:44,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=361380.0, ans=0.125 2024-08-10 04:10:54,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=361480.0, ans=0.09899494936611666 2024-08-10 04:10:57,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=361480.0, ans=0.125 2024-08-10 04:11:22,778 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2024-08-10 04:11:31,030 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.79 vs. limit=22.5 2024-08-10 04:11:46,111 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-08-10 04:11:50,804 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7200, loss[loss=0.141, beats_loss=0.01042, ecapa_loss=0.0002855, whisper_loss=0.1278, over 24532.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01236, ecapa_loss=0.0002856, whisper_loss=0.09999, over 3878296.32 frames. ], batch size: 92, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:11:59,727 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2024-08-10 04:12:07,357 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.897e+01 3.291e+01 3.668e+01 6.348e+01, threshold=6.581e+01, percent-clipped=0.0 2024-08-10 04:12:07,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=361980.0, ans=0.125 2024-08-10 04:12:09,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=361980.0, ans=0.125 2024-08-10 04:12:43,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=362180.0, ans=0.125 2024-08-10 04:12:48,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=362280.0, ans=0.0 2024-08-10 04:13:01,580 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7250, loss[loss=0.1168, beats_loss=0.01231, ecapa_loss=0.0002765, whisper_loss=0.1017, over 22997.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01236, ecapa_loss=0.0002851, whisper_loss=0.1005, over 3883309.39 frames. ], batch size: 92, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:13:10,724 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=12.0 2024-08-10 04:13:12,173 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2024-08-10 04:13:29,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=362580.0, ans=0.125 2024-08-10 04:13:35,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=362580.0, ans=0.125 2024-08-10 04:13:36,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=362580.0, ans=0.0 2024-08-10 04:13:52,028 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 04:14:03,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=362780.0, ans=0.0 2024-08-10 04:14:12,794 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7300, loss[loss=0.1257, beats_loss=0.01068, ecapa_loss=0.0002826, whisper_loss=0.1122, over 23256.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01232, ecapa_loss=0.0002886, whisper_loss=0.1005, over 3854034.44 frames. ], batch size: 91, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:14:21,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=362880.0, ans=0.2 2024-08-10 04:14:30,489 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+01 2.958e+01 3.364e+01 4.070e+01 6.476e+01, threshold=6.728e+01, percent-clipped=0.0 2024-08-10 04:14:40,023 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.32 vs. limit=22.5 2024-08-10 04:14:52,313 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.17 vs. limit=15.0 2024-08-10 04:14:58,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=363180.0, ans=0.1 2024-08-10 04:15:04,149 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-10 04:15:23,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=363380.0, ans=0.05 2024-08-10 04:15:24,217 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7350, loss[loss=0.1139, beats_loss=0.01115, ecapa_loss=0.0002538, whisper_loss=0.1002, over 21717.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01235, ecapa_loss=0.0002889, whisper_loss=0.1001, over 3858494.67 frames. ], batch size: 84, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:15:38,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.71 vs. limit=10.0 2024-08-10 04:15:41,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=363480.0, ans=0.2 2024-08-10 04:15:46,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=363480.0, ans=0.0 2024-08-10 04:15:49,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=363480.0, ans=0.0 2024-08-10 04:15:52,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=363580.0, ans=0.09899494936611666 2024-08-10 04:15:53,816 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 04:16:09,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=363680.0, ans=0.125 2024-08-10 04:16:11,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=363680.0, ans=0.1 2024-08-10 04:16:14,414 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 04:16:16,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=363680.0, ans=0.0 2024-08-10 04:16:34,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=363780.0, ans=0.125 2024-08-10 04:16:36,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=363880.0, ans=0.0 2024-08-10 04:16:36,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=363880.0, ans=0.125 2024-08-10 04:16:36,996 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7400, loss[loss=0.105, beats_loss=0.01185, ecapa_loss=0.0003512, whisper_loss=0.08961, over 17368.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01245, ecapa_loss=0.0002859, whisper_loss=0.1003, over 3876743.52 frames. ], batch size: 70, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:16:45,484 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.21 vs. limit=15.0 2024-08-10 04:16:54,658 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 3.042e+01 3.418e+01 4.034e+01 8.204e+01, threshold=6.837e+01, percent-clipped=2.0 2024-08-10 04:17:04,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=364080.0, ans=0.0 2024-08-10 04:17:08,690 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-10 04:17:16,235 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-10 04:17:19,163 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 33 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-10 04:17:26,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364180.0, ans=0.1 2024-08-10 04:17:26,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=364180.0, ans=0.125 2024-08-10 04:17:31,513 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 04:17:38,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.43 vs. limit=10.0 2024-08-10 04:17:44,028 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 04:17:45,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=364280.0, ans=0.125 2024-08-10 04:17:47,617 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2024-08-10 04:17:49,387 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7450, loss[loss=0.1135, beats_loss=0.01385, ecapa_loss=0.0002848, whisper_loss=0.09676, over 21616.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01242, ecapa_loss=0.0002893, whisper_loss=0.1002, over 3898476.25 frames. ], batch size: 89, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:17:51,569 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.51 vs. limit=15.0 2024-08-10 04:18:02,841 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-10 04:18:04,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=364480.0, ans=0.125 2024-08-10 04:18:11,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=364480.0, ans=0.04949747468305833 2024-08-10 04:18:35,970 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 04:18:41,493 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 04:19:03,246 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7500, loss[loss=0.1166, beats_loss=0.01233, ecapa_loss=0.000272, whisper_loss=0.1016, over 16683.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01241, ecapa_loss=0.0002893, whisper_loss=0.1003, over 3889617.06 frames. ], batch size: 69, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:19:09,475 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 04:19:15,196 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 04:19:20,569 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.956e+01 3.355e+01 3.815e+01 8.528e+01, threshold=6.709e+01, percent-clipped=1.0 2024-08-10 04:19:31,460 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 04:19:51,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=365180.0, ans=0.125 2024-08-10 04:19:55,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=365180.0, ans=0.1 2024-08-10 04:19:55,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=365180.0, ans=0.125 2024-08-10 04:20:01,035 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2024-08-10 04:20:11,214 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 04:20:16,892 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7550, loss[loss=0.1269, beats_loss=0.01258, ecapa_loss=0.0002857, whisper_loss=0.1115, over 17034.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01242, ecapa_loss=0.00029, whisper_loss=0.1005, over 3890096.79 frames. ], batch size: 64, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:20:20,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=365380.0, ans=0.125 2024-08-10 04:20:29,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365380.0, ans=0.1 2024-08-10 04:20:40,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=365480.0, ans=0.125 2024-08-10 04:20:40,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=365480.0, ans=0.125 2024-08-10 04:20:46,837 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=4.46 vs. limit=15.0 2024-08-10 04:20:47,339 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 04:20:49,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=365580.0, ans=0.125 2024-08-10 04:21:08,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=365680.0, ans=0.125 2024-08-10 04:21:30,142 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7600, loss[loss=0.1101, beats_loss=0.01264, ecapa_loss=0.0002971, whisper_loss=0.09447, over 15364.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01244, ecapa_loss=0.0002919, whisper_loss=0.09992, over 3867885.22 frames. ], batch size: 61, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:21:34,533 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-10 04:21:46,855 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+01 3.083e+01 3.503e+01 3.988e+01 6.295e+01, threshold=7.005e+01, percent-clipped=0.0 2024-08-10 04:22:28,596 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-10 04:22:30,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=366280.0, ans=0.1 2024-08-10 04:22:42,562 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7650, loss[loss=0.1091, beats_loss=0.01422, ecapa_loss=0.0002641, whisper_loss=0.09225, over 18291.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01242, ecapa_loss=0.0002911, whisper_loss=0.09913, over 3852326.49 frames. ], batch size: 73, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:22:46,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=366380.0, ans=0.025 2024-08-10 04:23:03,541 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 04:23:05,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=366480.0, ans=0.2 2024-08-10 04:23:08,045 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-10 04:23:08,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=366480.0, ans=12.0 2024-08-10 04:23:29,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=366680.0, ans=0.125 2024-08-10 04:23:37,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=366680.0, ans=0.0 2024-08-10 04:23:38,674 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 33 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-10 04:23:46,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=366780.0, ans=0.125 2024-08-10 04:23:48,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=366780.0, ans=0.09899494936611666 2024-08-10 04:23:54,103 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7700, loss[loss=0.1064, beats_loss=0.0103, ecapa_loss=0.0002973, whisper_loss=0.09317, over 20075.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01243, ecapa_loss=0.0002908, whisper_loss=0.09881, over 3855441.00 frames. ], batch size: 80, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:24:02,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=366880.0, ans=0.07 2024-08-10 04:24:10,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=366980.0, ans=0.2 2024-08-10 04:24:12,161 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.962e+01 3.373e+01 3.972e+01 7.552e+01, threshold=6.745e+01, percent-clipped=1.0 2024-08-10 04:24:33,651 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-10 04:24:43,405 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2024-08-10 04:25:03,070 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 04:25:06,923 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7750, loss[loss=0.1152, beats_loss=0.01143, ecapa_loss=0.0003217, whisper_loss=0.1005, over 18485.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01232, ecapa_loss=0.0002911, whisper_loss=0.09956, over 3895738.68 frames. ], batch size: 76, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:25:07,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=367380.0, ans=0.0 2024-08-10 04:25:13,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=367380.0, ans=0.125 2024-08-10 04:25:19,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=367380.0, ans=0.125 2024-08-10 04:25:25,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=367480.0, ans=0.125 2024-08-10 04:25:26,229 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=15.0 2024-08-10 04:25:41,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=367580.0, ans=0.0 2024-08-10 04:25:47,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=367580.0, ans=0.025 2024-08-10 04:26:07,985 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 04:26:10,429 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2024-08-10 04:26:18,455 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7800, loss[loss=0.09359, beats_loss=0.01405, ecapa_loss=0.000253, whisper_loss=0.07701, over 22378.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01232, ecapa_loss=0.0002886, whisper_loss=0.09983, over 3902142.23 frames. ], batch size: 93, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:26:25,374 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 21 from LS+wenet, 22 from Vox, 53 fro AS 2024-08-10 04:26:34,756 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.424e+01 3.081e+01 3.363e+01 3.893e+01 6.913e+01, threshold=6.726e+01, percent-clipped=1.0 2024-08-10 04:26:39,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=367980.0, ans=0.125 2024-08-10 04:26:54,189 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.05 vs. limit=15.0 2024-08-10 04:26:56,117 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 19 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 04:27:00,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=368180.0, ans=0.07 2024-08-10 04:27:02,256 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.63 vs. limit=10.0 2024-08-10 04:27:04,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=368180.0, ans=0.025 2024-08-10 04:27:05,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=368180.0, ans=0.125 2024-08-10 04:27:08,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=368180.0, ans=0.0 2024-08-10 04:27:17,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=368280.0, ans=0.125 2024-08-10 04:27:28,223 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7850, loss[loss=0.1224, beats_loss=0.0123, ecapa_loss=0.0003148, whisper_loss=0.1069, over 22906.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01242, ecapa_loss=0.0002899, whisper_loss=0.09928, over 3894179.77 frames. ], batch size: 93, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:27:47,863 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-10 04:27:50,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=368480.0, ans=0.125 2024-08-10 04:28:01,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=368580.0, ans=0.2 2024-08-10 04:28:35,917 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 04:28:38,454 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7900, loss[loss=0.1004, beats_loss=0.01336, ecapa_loss=0.0002442, whisper_loss=0.08457, over 15909.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01243, ecapa_loss=0.0002896, whisper_loss=0.09942, over 3891844.13 frames. ], batch size: 62, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:28:39,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=368880.0, ans=0.0 2024-08-10 04:28:42,661 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 30 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 04:28:54,644 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.374e+01 2.975e+01 3.379e+01 4.027e+01 6.816e+01, threshold=6.758e+01, percent-clipped=1.0 2024-08-10 04:28:56,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=368980.0, ans=0.1 2024-08-10 04:29:02,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.28 vs. limit=15.0 2024-08-10 04:29:23,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=369180.0, ans=0.2 2024-08-10 04:29:35,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=369280.0, ans=0.1 2024-08-10 04:29:42,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=369280.0, ans=0.125 2024-08-10 04:29:47,875 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 7950, loss[loss=0.09692, beats_loss=0.0114, ecapa_loss=0.0002915, whisper_loss=0.08261, over 17871.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01237, ecapa_loss=0.0002906, whisper_loss=0.0994, over 3885199.18 frames. ], batch size: 71, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:29:56,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=369380.0, ans=0.09899494936611666 2024-08-10 04:29:59,324 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 04:29:59,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=369380.0, ans=0.125 2024-08-10 04:30:03,899 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.08 vs. limit=15.0 2024-08-10 04:30:07,508 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=3.847e-01 2024-08-10 04:30:19,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=369580.0, ans=0.1 2024-08-10 04:30:25,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=369580.0, ans=0.1 2024-08-10 04:30:28,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=369680.0, ans=0.125 2024-08-10 04:30:36,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=369680.0, ans=0.0 2024-08-10 04:30:45,546 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 04:30:48,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=369780.0, ans=0.125 2024-08-10 04:30:56,740 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8000, loss[loss=0.09807, beats_loss=0.01681, ecapa_loss=0.0001984, whisper_loss=0.07928, over 23138.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01237, ecapa_loss=0.0002878, whisper_loss=0.09895, over 3892151.84 frames. ], batch size: 94, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:31:13,357 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+01 3.026e+01 3.341e+01 3.954e+01 6.055e+01, threshold=6.681e+01, percent-clipped=0.0 2024-08-10 04:31:24,420 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 04:31:28,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=370080.0, ans=0.1 2024-08-10 04:31:29,873 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 04:31:38,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=370180.0, ans=0.0 2024-08-10 04:31:50,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=370280.0, ans=0.0 2024-08-10 04:32:05,339 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8050, loss[loss=0.09048, beats_loss=0.01143, ecapa_loss=0.000285, whisper_loss=0.07619, over 15192.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01237, ecapa_loss=0.0002869, whisper_loss=0.0988, over 3860830.49 frames. ], batch size: 58, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:32:08,916 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.21 vs. limit=15.0 2024-08-10 04:32:21,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=370480.0, ans=0.2 2024-08-10 04:32:22,924 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 04:32:29,662 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 04:32:35,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=370580.0, ans=0.125 2024-08-10 04:32:49,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=370680.0, ans=0.125 2024-08-10 04:33:03,848 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.32 vs. limit=15.0 2024-08-10 04:33:04,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=370780.0, ans=0.125 2024-08-10 04:33:14,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=370880.0, ans=0.125 2024-08-10 04:33:14,972 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8100, loss[loss=0.1094, beats_loss=0.012, ecapa_loss=0.000291, whisper_loss=0.0945, over 21877.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01232, ecapa_loss=0.0002872, whisper_loss=0.09906, over 3838560.23 frames. ], batch size: 87, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:33:22,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=370880.0, ans=0.0 2024-08-10 04:33:31,451 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.921e+01 3.268e+01 3.818e+01 1.425e+02, threshold=6.536e+01, percent-clipped=1.0 2024-08-10 04:33:37,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=370980.0, ans=0.125 2024-08-10 04:33:37,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=370980.0, ans=0.0 2024-08-10 04:33:39,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=370980.0, ans=0.1 2024-08-10 04:33:40,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=370980.0, ans=0.1 2024-08-10 04:33:43,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=371080.0, ans=0.0 2024-08-10 04:33:48,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=371080.0, ans=0.2 2024-08-10 04:33:49,324 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.22 vs. limit=15.0 2024-08-10 04:33:53,961 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 04:33:56,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=371180.0, ans=0.1 2024-08-10 04:34:07,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=371180.0, ans=0.0 2024-08-10 04:34:21,893 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.53 vs. limit=10.0 2024-08-10 04:34:23,701 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8150, loss[loss=0.1256, beats_loss=0.01002, ecapa_loss=0.000351, whisper_loss=0.1121, over 21917.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01219, ecapa_loss=0.0002893, whisper_loss=0.1002, over 3848545.13 frames. ], batch size: 91, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:35:18,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=371780.0, ans=0.125 2024-08-10 04:35:19,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=371780.0, ans=0.125 2024-08-10 04:35:27,988 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-10 04:35:30,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=371880.0, ans=0.125 2024-08-10 04:35:31,873 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8200, loss[loss=0.1193, beats_loss=0.01309, ecapa_loss=0.0002092, whisper_loss=0.1041, over 22053.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01225, ecapa_loss=0.0002896, whisper_loss=0.1005, over 3868012.68 frames. ], batch size: 83, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:35:47,108 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.14 vs. limit=15.0 2024-08-10 04:35:48,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.405e+01 2.993e+01 3.348e+01 3.834e+01 8.342e+01, threshold=6.697e+01, percent-clipped=3.0 2024-08-10 04:35:49,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=371980.0, ans=0.0 2024-08-10 04:36:32,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=372280.0, ans=0.1 2024-08-10 04:36:33,938 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 28 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 04:36:39,566 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2024-08-10 04:36:42,582 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8250, loss[loss=0.1263, beats_loss=0.01012, ecapa_loss=0.0003048, whisper_loss=0.1131, over 17260.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01233, ecapa_loss=0.00029, whisper_loss=0.09898, over 3855903.58 frames. ], batch size: 67, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:36:50,060 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2024-08-10 04:36:54,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=372380.0, ans=0.0 2024-08-10 04:37:05,366 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 04:37:18,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=372580.0, ans=0.125 2024-08-10 04:37:31,404 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=22.5 2024-08-10 04:37:31,899 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 04:37:36,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=372680.0, ans=0.0 2024-08-10 04:37:45,732 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.64 vs. limit=15.0 2024-08-10 04:37:52,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=372780.0, ans=0.0 2024-08-10 04:37:54,842 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8300, loss[loss=0.1154, beats_loss=0.01588, ecapa_loss=0.0002107, whisper_loss=0.09739, over 21063.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01237, ecapa_loss=0.0002899, whisper_loss=0.09867, over 3871865.63 frames. ], batch size: 81, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:38:07,849 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 20 from LS+wenet, 21 from Vox, 54 fro AS 2024-08-10 04:38:09,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=372980.0, ans=0.0 2024-08-10 04:38:12,518 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.364e+01 3.111e+01 3.544e+01 4.051e+01 1.362e+02, threshold=7.088e+01, percent-clipped=2.0 2024-08-10 04:38:20,795 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=22.5 2024-08-10 04:38:34,435 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 04:38:39,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=373180.0, ans=0.125 2024-08-10 04:38:43,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=373180.0, ans=0.125 2024-08-10 04:39:07,689 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8350, loss[loss=0.1093, beats_loss=0.01104, ecapa_loss=0.0003242, whisper_loss=0.09504, over 19329.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01237, ecapa_loss=0.0002886, whisper_loss=0.0995, over 3916542.00 frames. ], batch size: 78, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:39:20,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=373480.0, ans=0.0 2024-08-10 04:39:21,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=373480.0, ans=0.125 2024-08-10 04:39:27,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=373480.0, ans=0.125 2024-08-10 04:39:35,747 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-10 04:39:36,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=373580.0, ans=0.0 2024-08-10 04:39:58,605 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.28 vs. limit=22.5 2024-08-10 04:40:26,161 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8400, loss[loss=0.1361, beats_loss=0.009411, ecapa_loss=0.0003257, whisper_loss=0.1234, over 18109.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01234, ecapa_loss=0.0002897, whisper_loss=0.09972, over 3896308.12 frames. ], batch size: 70, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:40:30,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=373880.0, ans=0.0 2024-08-10 04:40:31,691 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 04:40:31,977 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.880e+05 2024-08-10 04:40:33,635 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 04:40:37,663 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-10 04:40:48,128 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.340e+01 2.937e+01 3.360e+01 3.795e+01 5.469e+01, threshold=6.721e+01, percent-clipped=0.0 2024-08-10 04:40:59,494 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-10 04:41:26,266 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 19 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-10 04:41:31,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=374180.0, ans=0.0 2024-08-10 04:41:35,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=374180.0, ans=0.125 2024-08-10 04:41:41,977 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.85 vs. limit=22.5 2024-08-10 04:41:42,988 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 38 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 04:41:47,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=374280.0, ans=0.1 2024-08-10 04:41:57,154 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8450, loss[loss=0.1306, beats_loss=0.01161, ecapa_loss=0.0002356, whisper_loss=0.1167, over 23839.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01231, ecapa_loss=0.0002903, whisper_loss=0.1006, over 3903575.98 frames. ], batch size: 90, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:42:04,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=374380.0, ans=0.0 2024-08-10 04:42:35,671 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=9.940e-02 2024-08-10 04:42:49,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=374580.0, ans=0.0 2024-08-10 04:42:52,204 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 04:42:52,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=374680.0, ans=0.0 2024-08-10 04:42:56,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=374680.0, ans=0.1 2024-08-10 04:43:12,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=374780.0, ans=0.07 2024-08-10 04:43:20,737 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 21 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-10 04:43:23,787 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 04:43:27,382 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8500, loss[loss=0.1121, beats_loss=0.01162, ecapa_loss=0.0002834, whisper_loss=0.0976, over 21250.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01235, ecapa_loss=0.0002911, whisper_loss=0.09987, over 3929262.39 frames. ], batch size: 83, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:43:37,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=374880.0, ans=0.0 2024-08-10 04:43:41,792 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-10 04:43:47,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=374980.0, ans=0.125 2024-08-10 04:43:48,293 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.281e+01 3.067e+01 3.351e+01 3.844e+01 5.655e+01, threshold=6.702e+01, percent-clipped=0.0 2024-08-10 04:44:02,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=375080.0, ans=0.125 2024-08-10 04:44:13,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=375080.0, ans=0.125 2024-08-10 04:44:54,617 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8550, loss[loss=0.09497, beats_loss=0.01512, ecapa_loss=0.0003078, whisper_loss=0.07677, over 13430.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01236, ecapa_loss=0.0002882, whisper_loss=0.09964, over 3909907.45 frames. ], batch size: 57, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:45:05,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=375380.0, ans=0.125 2024-08-10 04:45:09,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=375380.0, ans=0.125 2024-08-10 04:45:27,369 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.23 vs. limit=15.0 2024-08-10 04:45:33,904 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 04:45:40,954 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 12 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 04:45:50,873 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 04:45:52,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=375680.0, ans=0.0 2024-08-10 04:46:11,357 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8600, loss[loss=0.09968, beats_loss=0.01186, ecapa_loss=0.0002759, whisper_loss=0.08506, over 19801.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01235, ecapa_loss=0.0002865, whisper_loss=0.09931, over 3898921.87 frames. ], batch size: 79, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:46:22,206 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-10 04:46:27,837 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 3.077e+01 3.509e+01 3.969e+01 6.307e+01, threshold=7.019e+01, percent-clipped=0.0 2024-08-10 04:46:32,243 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-10 04:46:37,753 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 04:46:44,169 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=15.0 2024-08-10 04:47:10,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=376280.0, ans=0.2 2024-08-10 04:47:10,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=376280.0, ans=0.125 2024-08-10 04:47:21,468 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8650, loss[loss=0.1465, beats_loss=0.009517, ecapa_loss=0.0003168, whisper_loss=0.1338, over 22173.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01234, ecapa_loss=0.0002889, whisper_loss=0.09902, over 3918714.09 frames. ], batch size: 89, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:47:38,452 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-10 04:47:54,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=376580.0, ans=0.125 2024-08-10 04:47:55,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=376580.0, ans=0.0 2024-08-10 04:47:58,351 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 04:48:10,915 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-10 04:48:16,205 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 04:48:17,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=376780.0, ans=0.125 2024-08-10 04:48:29,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.49 vs. limit=15.0 2024-08-10 04:48:31,421 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8700, loss[loss=0.1123, beats_loss=0.01134, ecapa_loss=0.000286, whisper_loss=0.09812, over 14915.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01227, ecapa_loss=0.0002894, whisper_loss=0.09988, over 3922055.49 frames. ], batch size: 57, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:48:31,712 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 18 from LS+wenet, 25 from Vox, 50 fro AS 2024-08-10 04:48:34,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=376880.0, ans=0.2 2024-08-10 04:48:47,791 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.537e+01 3.015e+01 3.371e+01 3.912e+01 6.380e+01, threshold=6.741e+01, percent-clipped=0.0 2024-08-10 04:49:10,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=377080.0, ans=0.0 2024-08-10 04:49:14,544 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-08-10 04:49:19,699 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 25 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-10 04:49:21,500 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 04:49:23,824 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-10 04:49:31,823 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 04:49:39,865 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8750, loss[loss=0.105, beats_loss=0.0148, ecapa_loss=0.0003126, whisper_loss=0.08703, over 21178.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01238, ecapa_loss=0.0002901, whisper_loss=0.09865, over 3909366.30 frames. ], batch size: 93, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:49:48,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=377380.0, ans=0.0 2024-08-10 04:49:56,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=377480.0, ans=0.0 2024-08-10 04:50:09,597 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-10 04:50:31,471 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2024-08-10 04:50:32,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=377680.0, ans=0.0 2024-08-10 04:50:33,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=377780.0, ans=0.125 2024-08-10 04:50:37,655 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-08-10 04:50:42,617 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 04:50:47,881 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8800, loss[loss=0.08996, beats_loss=0.01291, ecapa_loss=0.0003106, whisper_loss=0.07395, over 14716.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01239, ecapa_loss=0.0002906, whisper_loss=0.09927, over 3915810.24 frames. ], batch size: 63, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:50:48,512 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.968e-01 2024-08-10 04:50:52,153 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 04:50:52,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=377880.0, ans=0.0 2024-08-10 04:50:52,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=377880.0, ans=15.0 2024-08-10 04:51:04,436 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.359e+01 3.131e+01 3.473e+01 4.096e+01 6.875e+01, threshold=6.946e+01, percent-clipped=1.0 2024-08-10 04:51:10,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=377980.0, ans=0.1 2024-08-10 04:51:21,194 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 04:51:23,946 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 04:51:37,551 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.86 vs. limit=22.5 2024-08-10 04:51:42,449 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-10 04:51:57,630 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8850, loss[loss=0.1035, beats_loss=0.01456, ecapa_loss=0.0002776, whisper_loss=0.08612, over 21383.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01245, ecapa_loss=0.0002907, whisper_loss=0.09878, over 3915650.94 frames. ], batch size: 88, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:51:58,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=378380.0, ans=0.125 2024-08-10 04:52:00,648 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-10 04:52:01,918 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 04:52:05,358 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=15.0 2024-08-10 04:52:06,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=378380.0, ans=0.0 2024-08-10 04:52:15,247 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=15.0 2024-08-10 04:52:26,867 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-10 04:52:27,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=378580.0, ans=0.0 2024-08-10 04:52:28,865 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2024-08-10 04:52:42,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=378680.0, ans=0.2 2024-08-10 04:53:02,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=378780.0, ans=0.0 2024-08-10 04:53:04,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378880.0, ans=0.1 2024-08-10 04:53:05,792 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8900, loss[loss=0.1171, beats_loss=0.01151, ecapa_loss=0.0003169, whisper_loss=0.1024, over 21048.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01241, ecapa_loss=0.0002891, whisper_loss=0.09856, over 3884111.34 frames. ], batch size: 87, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:53:11,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=378880.0, ans=0.125 2024-08-10 04:53:22,393 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.485e+01 3.017e+01 3.379e+01 3.848e+01 7.752e+01, threshold=6.759e+01, percent-clipped=1.0 2024-08-10 04:53:28,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=378980.0, ans=0.125 2024-08-10 04:53:31,676 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2024-08-10 04:53:40,156 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-10 04:53:43,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=379080.0, ans=0.025 2024-08-10 04:53:43,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=379080.0, ans=0.125 2024-08-10 04:53:49,384 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2024-08-10 04:53:59,669 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 04:54:14,273 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 8950, loss[loss=0.1273, beats_loss=0.01322, ecapa_loss=0.0003085, whisper_loss=0.111, over 21864.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01226, ecapa_loss=0.0002895, whisper_loss=0.09939, over 3894662.30 frames. ], batch size: 88, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:54:14,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=379380.0, ans=0.125 2024-08-10 04:54:20,747 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 04:54:38,126 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2024-08-10 04:54:43,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=379580.0, ans=0.0 2024-08-10 04:54:52,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=379580.0, ans=0.1 2024-08-10 04:55:22,704 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9000, loss[loss=0.1104, beats_loss=0.01128, ecapa_loss=0.0002615, whisper_loss=0.09646, over 19707.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01227, ecapa_loss=0.0002902, whisper_loss=0.09919, over 3897246.71 frames. ], batch size: 77, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:55:22,704 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-10 04:56:01,286 INFO [train_multi_KD3.py:1149] (2/4) Epoch 3, validation on ASR_libri: loss=0.2773, beats_loss=0, ecapa_loss=0.0008691, whisper_loss=0.2686, over 922467.00 frames. 2024-08-10 04:56:19,267 INFO [train_multi_KD3.py:1149] (2/4) Epoch 3, validation on SV_voxceleb1: loss=0.007577, beats_loss=0, ecapa_loss=0.0007577, whisper_loss=0, over 939242.00 frames. 2024-08-10 04:58:16,696 INFO [train_multi_KD3.py:1149] (2/4) Epoch 3, validation on AT_audioset: loss=0.02874, beats_loss=0.02874, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 04:58:16,700 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 04:58:21,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=379880.0, ans=0.0 2024-08-10 04:58:33,113 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.379e+01 3.022e+01 3.372e+01 4.052e+01 6.376e+01, threshold=6.745e+01, percent-clipped=0.0 2024-08-10 04:58:46,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=380080.0, ans=0.125 2024-08-10 04:58:56,356 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.97 vs. limit=6.0 2024-08-10 04:58:59,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=380180.0, ans=0.0 2024-08-10 04:59:12,991 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.58 vs. limit=22.5 2024-08-10 04:59:21,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=380280.0, ans=0.125 2024-08-10 04:59:21,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=380280.0, ans=0.1 2024-08-10 04:59:25,801 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9050, loss[loss=0.124, beats_loss=0.01175, ecapa_loss=0.0002814, whisper_loss=0.1095, over 21843.00 frames. ], tot_loss[loss=0.115, beats_loss=0.0122, ecapa_loss=0.000291, whisper_loss=0.0999, over 3882039.25 frames. ], batch size: 87, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 04:59:27,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=380380.0, ans=0.125 2024-08-10 04:59:29,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=380380.0, ans=0.0 2024-08-10 04:59:30,333 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 04:59:45,253 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-10 04:59:56,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=380580.0, ans=0.2 2024-08-10 04:59:59,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=380580.0, ans=0.2 2024-08-10 05:00:06,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=380680.0, ans=0.95 2024-08-10 05:00:09,966 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-10 05:00:16,272 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.76 vs. limit=22.5 2024-08-10 05:00:33,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=380880.0, ans=0.125 2024-08-10 05:00:34,541 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9100, loss[loss=0.1197, beats_loss=0.01298, ecapa_loss=0.0003888, whisper_loss=0.1028, over 20836.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01219, ecapa_loss=0.0002902, whisper_loss=0.1001, over 3914053.79 frames. ], batch size: 88, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:00:40,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=380880.0, ans=0.0 2024-08-10 05:00:51,277 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.151e+01 2.817e+01 3.235e+01 3.647e+01 7.816e+01, threshold=6.470e+01, percent-clipped=1.0 2024-08-10 05:01:06,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=381080.0, ans=0.125 2024-08-10 05:01:17,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=381180.0, ans=0.125 2024-08-10 05:01:22,516 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.35 vs. limit=15.0 2024-08-10 05:01:39,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=381280.0, ans=0.0 2024-08-10 05:01:43,588 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9150, loss[loss=0.0964, beats_loss=0.01373, ecapa_loss=0.000273, whisper_loss=0.07994, over 21976.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01224, ecapa_loss=0.0002889, whisper_loss=0.1001, over 3907938.30 frames. ], batch size: 91, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:01:48,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=381380.0, ans=0.125 2024-08-10 05:01:53,515 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 05:02:08,148 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.27 vs. limit=15.0 2024-08-10 05:02:16,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2024-08-10 05:02:17,078 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 05:02:33,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=381680.0, ans=0.125 2024-08-10 05:02:34,220 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=15.53 vs. limit=15.0 2024-08-10 05:02:52,629 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9200, loss[loss=0.1096, beats_loss=0.01206, ecapa_loss=0.0003552, whisper_loss=0.09397, over 19521.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01225, ecapa_loss=0.0002929, whisper_loss=0.09996, over 3883411.44 frames. ], batch size: 81, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:02:58,480 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 05:02:58,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=381880.0, ans=0.125 2024-08-10 05:02:59,111 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.44 vs. limit=15.0 2024-08-10 05:03:08,919 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 3.038e+01 3.317e+01 3.849e+01 8.293e+01, threshold=6.633e+01, percent-clipped=1.0 2024-08-10 05:03:09,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=381980.0, ans=0.125 2024-08-10 05:03:13,301 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 05:03:28,890 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.65 vs. limit=22.5 2024-08-10 05:03:46,394 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 05:04:00,839 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9250, loss[loss=0.1153, beats_loss=0.01256, ecapa_loss=0.0002762, whisper_loss=0.1, over 20863.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01225, ecapa_loss=0.0002955, whisper_loss=0.1, over 3902009.46 frames. ], batch size: 84, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:04:20,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=382480.0, ans=0.125 2024-08-10 05:04:24,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=382480.0, ans=0.0 2024-08-10 05:04:31,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=382580.0, ans=0.0 2024-08-10 05:04:45,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=382680.0, ans=0.0 2024-08-10 05:04:53,637 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2024-08-10 05:04:55,456 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.68 vs. limit=15.0 2024-08-10 05:04:57,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=382780.0, ans=0.125 2024-08-10 05:05:05,701 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 05:05:09,749 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9300, loss[loss=0.106, beats_loss=0.01478, ecapa_loss=0.0002371, whisper_loss=0.0888, over 16497.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01231, ecapa_loss=0.0002931, whisper_loss=0.09964, over 3935003.40 frames. ], batch size: 67, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:05:13,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=382880.0, ans=0.125 2024-08-10 05:05:16,119 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.38 vs. limit=15.0 2024-08-10 05:05:24,268 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=1.90 vs. limit=15.0 2024-08-10 05:05:26,179 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.442e+01 3.054e+01 3.480e+01 4.164e+01 1.138e+02, threshold=6.960e+01, percent-clipped=2.0 2024-08-10 05:05:29,295 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 05:05:45,488 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.15 vs. limit=15.0 2024-08-10 05:05:57,065 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 7 from Vox, 31 fro AS 2024-08-10 05:05:58,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=383180.0, ans=0.0 2024-08-10 05:05:59,896 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 05:06:12,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=383280.0, ans=0.125 2024-08-10 05:06:13,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=383280.0, ans=0.1 2024-08-10 05:06:18,799 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9350, loss[loss=0.1081, beats_loss=0.01522, ecapa_loss=0.0001962, whisper_loss=0.09087, over 14945.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01234, ecapa_loss=0.000294, whisper_loss=0.09899, over 3902917.40 frames. ], batch size: 57, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:06:20,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=383380.0, ans=0.0 2024-08-10 05:06:22,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=383380.0, ans=0.125 2024-08-10 05:06:56,208 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.06 vs. limit=15.0 2024-08-10 05:07:00,571 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.33 vs. limit=10.0 2024-08-10 05:07:10,980 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-10 05:07:12,426 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 32 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 05:07:24,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=383780.0, ans=0.125 2024-08-10 05:07:25,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=383780.0, ans=0.125 2024-08-10 05:07:29,248 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9400, loss[loss=0.09781, beats_loss=0.01234, ecapa_loss=0.0003277, whisper_loss=0.08219, over 15206.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01242, ecapa_loss=0.0002932, whisper_loss=0.098, over 3880535.28 frames. ], batch size: 62, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:07:29,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=383880.0, ans=0.0 2024-08-10 05:07:43,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=383980.0, ans=0.1 2024-08-10 05:07:45,556 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.317e+01 2.835e+01 3.411e+01 3.975e+01 7.515e+01, threshold=6.823e+01, percent-clipped=1.0 2024-08-10 05:07:47,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=383980.0, ans=0.025 2024-08-10 05:08:22,770 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 05:08:24,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=384280.0, ans=0.125 2024-08-10 05:08:37,810 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9450, loss[loss=0.1106, beats_loss=0.01386, ecapa_loss=0.0002404, whisper_loss=0.09434, over 20517.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01242, ecapa_loss=0.000292, whisper_loss=0.09896, over 3912603.22 frames. ], batch size: 78, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:09:00,222 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.87 vs. limit=15.0 2024-08-10 05:09:07,618 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-10 05:09:22,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=384680.0, ans=0.125 2024-08-10 05:09:46,503 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9500, loss[loss=0.09458, beats_loss=0.01381, ecapa_loss=0.0002149, whisper_loss=0.07862, over 18728.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.0124, ecapa_loss=0.0002926, whisper_loss=0.09856, over 3910348.31 frames. ], batch size: 71, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:09:48,054 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 05:09:54,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=384880.0, ans=0.1 2024-08-10 05:09:55,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=384880.0, ans=10.0 2024-08-10 05:09:59,546 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 05:10:03,249 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.474e+01 3.035e+01 3.445e+01 3.941e+01 9.468e+01, threshold=6.890e+01, percent-clipped=2.0 2024-08-10 05:10:17,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=385080.0, ans=0.125 2024-08-10 05:10:28,554 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 05:10:54,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=385380.0, ans=0.0 2024-08-10 05:10:55,345 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9550, loss[loss=0.1325, beats_loss=0.0102, ecapa_loss=0.000307, whisper_loss=0.1193, over 21551.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01235, ecapa_loss=0.000291, whisper_loss=0.09878, over 3896462.97 frames. ], batch size: 83, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:10:55,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=385380.0, ans=0.125 2024-08-10 05:11:10,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=385480.0, ans=0.025 2024-08-10 05:11:42,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=385680.0, ans=0.125 2024-08-10 05:12:02,111 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 05:12:04,867 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9600, loss[loss=0.1002, beats_loss=0.01239, ecapa_loss=0.0003378, whisper_loss=0.08443, over 21479.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01244, ecapa_loss=0.0002916, whisper_loss=0.09816, over 3887986.67 frames. ], batch size: 91, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:12:18,712 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-10 05:12:19,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=385980.0, ans=0.05 2024-08-10 05:12:21,322 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 3.010e+01 3.489e+01 4.021e+01 7.106e+01, threshold=6.979e+01, percent-clipped=1.0 2024-08-10 05:12:37,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=386080.0, ans=0.0 2024-08-10 05:12:44,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=386080.0, ans=0.0 2024-08-10 05:12:46,903 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 05:12:47,581 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.72 vs. limit=15.0 2024-08-10 05:12:50,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=386180.0, ans=0.0 2024-08-10 05:12:55,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=386180.0, ans=0.0 2024-08-10 05:13:11,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=386280.0, ans=0.0 2024-08-10 05:13:14,602 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9650, loss[loss=0.1092, beats_loss=0.01303, ecapa_loss=0.000308, whisper_loss=0.09309, over 17186.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01234, ecapa_loss=0.0002912, whisper_loss=0.09928, over 3868663.51 frames. ], batch size: 69, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:13:16,205 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 05:13:21,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=386380.0, ans=0.0 2024-08-10 05:13:39,971 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-10 05:13:44,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=386580.0, ans=0.2 2024-08-10 05:13:51,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=386580.0, ans=0.1 2024-08-10 05:13:54,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=386580.0, ans=0.0 2024-08-10 05:13:58,899 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.14 vs. limit=22.5 2024-08-10 05:14:01,247 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-10 05:14:04,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=386680.0, ans=0.125 2024-08-10 05:14:05,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=386680.0, ans=0.125 2024-08-10 05:14:24,543 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9700, loss[loss=0.09857, beats_loss=0.0111, ecapa_loss=0.000346, whisper_loss=0.08402, over 13365.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01235, ecapa_loss=0.0002926, whisper_loss=0.0993, over 3860704.93 frames. ], batch size: 55, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:14:27,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=386880.0, ans=0.035 2024-08-10 05:14:40,623 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.854e+01 3.317e+01 3.898e+01 6.731e+01, threshold=6.635e+01, percent-clipped=0.0 2024-08-10 05:14:58,473 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.31 vs. limit=10.0 2024-08-10 05:14:59,068 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-10 05:15:03,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=387080.0, ans=0.0 2024-08-10 05:15:10,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=387180.0, ans=0.1 2024-08-10 05:15:13,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=387180.0, ans=0.0 2024-08-10 05:15:28,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=387280.0, ans=0.125 2024-08-10 05:15:30,059 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 05:15:33,966 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9750, loss[loss=0.1105, beats_loss=0.01395, ecapa_loss=0.0002796, whisper_loss=0.09373, over 20608.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.0123, ecapa_loss=0.0002908, whisper_loss=0.09962, over 3863228.10 frames. ], batch size: 84, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:15:39,118 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-08-10 05:15:49,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=387480.0, ans=0.125 2024-08-10 05:15:58,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.87 vs. limit=22.5 2024-08-10 05:16:18,394 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 05:16:31,303 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.35 vs. limit=10.0 2024-08-10 05:16:43,178 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9800, loss[loss=0.1127, beats_loss=0.0137, ecapa_loss=0.0002689, whisper_loss=0.09633, over 23086.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01229, ecapa_loss=0.0002896, whisper_loss=0.09996, over 3875220.81 frames. ], batch size: 92, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:16:48,318 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2024-08-10 05:16:59,826 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 2.835e+01 3.207e+01 3.802e+01 6.736e+01, threshold=6.414e+01, percent-clipped=1.0 2024-08-10 05:17:01,487 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-10 05:17:06,932 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 05:17:07,609 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-08-10 05:17:11,012 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 29 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 05:17:16,944 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.45 vs. limit=15.0 2024-08-10 05:17:21,073 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2024-08-10 05:17:29,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=388180.0, ans=0.125 2024-08-10 05:17:30,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=388180.0, ans=0.0 2024-08-10 05:17:40,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=388280.0, ans=0.125 2024-08-10 05:17:44,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=388280.0, ans=0.2 2024-08-10 05:17:45,231 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 05:17:51,963 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9850, loss[loss=0.1094, beats_loss=0.01106, ecapa_loss=0.0002714, whisper_loss=0.09565, over 16986.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01226, ecapa_loss=0.0002915, whisper_loss=0.09955, over 3853201.11 frames. ], batch size: 64, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:17:55,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=388380.0, ans=0.0 2024-08-10 05:18:00,336 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 05:18:00,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=388380.0, ans=0.125 2024-08-10 05:18:16,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=388480.0, ans=0.125 2024-08-10 05:18:23,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=388580.0, ans=0.0 2024-08-10 05:18:29,400 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-10 05:18:51,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=388780.0, ans=0.0 2024-08-10 05:19:00,754 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9900, loss[loss=0.1182, beats_loss=0.01263, ecapa_loss=0.000295, whisper_loss=0.1026, over 17870.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01223, ecapa_loss=0.0002916, whisper_loss=0.1002, over 3861953.09 frames. ], batch size: 71, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:19:02,298 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 05:19:17,354 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.911e+01 3.357e+01 3.805e+01 2.149e+02, threshold=6.715e+01, percent-clipped=2.0 2024-08-10 05:19:18,739 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 31 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 05:19:34,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=389080.0, ans=0.0 2024-08-10 05:19:46,060 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2024-08-10 05:19:52,752 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=11.06 vs. limit=10.0 2024-08-10 05:19:53,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=389180.0, ans=0.125 2024-08-10 05:19:54,196 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2024-08-10 05:19:55,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=389280.0, ans=0.125 2024-08-10 05:19:56,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=389280.0, ans=0.125 2024-08-10 05:19:58,321 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.48 vs. limit=22.5 2024-08-10 05:20:10,179 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 9950, loss[loss=0.1132, beats_loss=0.01005, ecapa_loss=0.0003752, whisper_loss=0.09941, over 17625.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01218, ecapa_loss=0.0002926, whisper_loss=0.1014, over 3869044.19 frames. ], batch size: 72, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:20:17,725 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.21 vs. limit=15.0 2024-08-10 05:20:19,643 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 33 from Vox, 28 fro AS 2024-08-10 05:20:23,890 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-10 05:20:40,890 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=5.11 vs. limit=15.0 2024-08-10 05:20:41,908 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.02 vs. limit=15.0 2024-08-10 05:20:49,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=389680.0, ans=0.025 2024-08-10 05:21:07,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=389780.0, ans=0.125 2024-08-10 05:21:07,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=389780.0, ans=0.1 2024-08-10 05:21:08,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=389780.0, ans=0.0 2024-08-10 05:21:17,618 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10000, loss[loss=0.1026, beats_loss=0.01389, ecapa_loss=0.0002779, whisper_loss=0.08595, over 14102.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01223, ecapa_loss=0.0002941, whisper_loss=0.1004, over 3834712.72 frames. ], batch size: 57, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:21:22,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=389880.0, ans=0.0 2024-08-10 05:21:34,872 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+01 3.053e+01 3.527e+01 4.199e+01 1.415e+02, threshold=7.054e+01, percent-clipped=3.0 2024-08-10 05:21:50,560 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 05:21:54,607 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 37 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 05:22:19,067 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 05:22:23,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=390280.0, ans=0.125 2024-08-10 05:22:27,118 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10050, loss[loss=0.1148, beats_loss=0.0125, ecapa_loss=0.0003188, whisper_loss=0.09913, over 15509.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01219, ecapa_loss=0.0002942, whisper_loss=0.1002, over 3847050.44 frames. ], batch size: 64, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:22:34,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2024-08-10 05:22:59,076 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 05:23:19,240 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-10 05:23:35,202 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10100, loss[loss=0.128, beats_loss=0.01309, ecapa_loss=0.0003547, whisper_loss=0.1113, over 14593.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01234, ecapa_loss=0.0002916, whisper_loss=0.09915, over 3854012.49 frames. ], batch size: 61, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:23:35,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=390880.0, ans=0.1 2024-08-10 05:23:38,421 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 05:23:49,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=390980.0, ans=0.125 2024-08-10 05:23:50,211 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-10 05:23:51,265 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.903e+01 3.270e+01 3.742e+01 9.283e+01, threshold=6.541e+01, percent-clipped=1.0 2024-08-10 05:23:55,434 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 21 from LS+wenet, 27 from Vox, 47 fro AS 2024-08-10 05:24:05,560 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.79 vs. limit=22.5 2024-08-10 05:24:06,396 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 05:24:12,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=391080.0, ans=0.0 2024-08-10 05:24:23,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=391180.0, ans=0.125 2024-08-10 05:24:24,617 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 05:24:30,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=391280.0, ans=0.125 2024-08-10 05:24:44,533 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10150, loss[loss=0.1005, beats_loss=0.0148, ecapa_loss=0.0002333, whisper_loss=0.08338, over 18143.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01238, ecapa_loss=0.0002911, whisper_loss=0.09888, over 3886042.31 frames. ], batch size: 71, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:25:15,752 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2024-08-10 05:25:20,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=391580.0, ans=0.125 2024-08-10 05:25:54,977 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-08-10 05:25:56,615 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.03 vs. limit=15.0 2024-08-10 05:25:57,010 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10200, loss[loss=0.1211, beats_loss=0.01303, ecapa_loss=0.0002386, whisper_loss=0.1057, over 20202.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01226, ecapa_loss=0.0002914, whisper_loss=0.09951, over 3862837.64 frames. ], batch size: 78, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:25:57,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=391880.0, ans=0.125 2024-08-10 05:26:01,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=391880.0, ans=0.1 2024-08-10 05:26:12,161 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.87 vs. limit=6.0 2024-08-10 05:26:14,423 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 2.916e+01 3.286e+01 3.891e+01 7.167e+01, threshold=6.572e+01, percent-clipped=1.0 2024-08-10 05:26:25,182 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-10 05:26:51,389 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.92 vs. limit=15.0 2024-08-10 05:26:59,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=392280.0, ans=0.07 2024-08-10 05:27:12,098 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10250, loss[loss=0.1004, beats_loss=0.01566, ecapa_loss=0.0002297, whisper_loss=0.08241, over 17133.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01228, ecapa_loss=0.0002895, whisper_loss=0.09929, over 3850822.78 frames. ], batch size: 68, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:27:27,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=392480.0, ans=0.0 2024-08-10 05:27:34,516 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.54 vs. limit=15.0 2024-08-10 05:27:53,207 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 05:28:13,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=392780.0, ans=0.125 2024-08-10 05:28:19,096 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 21 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-10 05:28:19,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=392780.0, ans=0.125 2024-08-10 05:28:20,494 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 05:28:26,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=392880.0, ans=0.2 2024-08-10 05:28:27,507 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10300, loss[loss=0.1256, beats_loss=0.009863, ecapa_loss=0.0003184, whisper_loss=0.1126, over 22515.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01225, ecapa_loss=0.0002896, whisper_loss=0.09937, over 3840793.36 frames. ], batch size: 90, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:28:46,139 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 3.063e+01 3.413e+01 3.835e+01 1.358e+02, threshold=6.826e+01, percent-clipped=1.0 2024-08-10 05:28:50,732 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 05:29:06,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=54.32 vs. limit=15.0 2024-08-10 05:29:13,821 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 05:29:22,686 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=15.0 2024-08-10 05:29:31,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=393280.0, ans=0.125 2024-08-10 05:29:45,912 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10350, loss[loss=0.07551, beats_loss=0.01349, ecapa_loss=0.0003113, whisper_loss=0.05891, over 15869.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01231, ecapa_loss=0.0002883, whisper_loss=0.09931, over 3855766.18 frames. ], batch size: 68, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:29:52,784 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 35 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 05:29:57,146 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.68 vs. limit=12.0 2024-08-10 05:30:03,222 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.35 vs. limit=6.0 2024-08-10 05:30:05,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=393480.0, ans=0.125 2024-08-10 05:30:13,565 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 05:30:15,757 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=11.69 vs. limit=10.0 2024-08-10 05:30:22,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=393580.0, ans=10.0 2024-08-10 05:30:30,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=393580.0, ans=0.5 2024-08-10 05:30:39,018 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 19 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-10 05:30:46,704 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 05:30:49,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=393780.0, ans=0.125 2024-08-10 05:30:57,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=393780.0, ans=0.1 2024-08-10 05:31:03,137 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10400, loss[loss=0.1099, beats_loss=0.01298, ecapa_loss=0.0002437, whisper_loss=0.09453, over 23891.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01232, ecapa_loss=0.0002868, whisper_loss=0.09835, over 3864783.87 frames. ], batch size: 95, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:31:09,092 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 05:31:20,364 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.313e+01 2024-08-10 05:31:21,074 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.940e+01 3.359e+01 3.808e+01 2.361e+02, threshold=6.718e+01, percent-clipped=2.0 2024-08-10 05:31:28,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=393980.0, ans=0.0 2024-08-10 05:31:37,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=394080.0, ans=0.2 2024-08-10 05:31:38,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=394080.0, ans=0.1 2024-08-10 05:31:40,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=394080.0, ans=0.0 2024-08-10 05:31:57,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=394180.0, ans=0.125 2024-08-10 05:32:00,296 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 05:32:00,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=394280.0, ans=0.125 2024-08-10 05:32:06,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=394280.0, ans=0.0 2024-08-10 05:32:12,579 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.48 vs. limit=22.5 2024-08-10 05:32:15,646 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.40 vs. limit=15.0 2024-08-10 05:32:16,078 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10450, loss[loss=0.1472, beats_loss=0.007155, ecapa_loss=0.0003005, whisper_loss=0.1371, over 18482.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01221, ecapa_loss=0.000287, whisper_loss=0.09831, over 3840770.51 frames. ], batch size: 69, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:32:22,267 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 05:32:33,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=394480.0, ans=0.2 2024-08-10 05:32:39,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=394480.0, ans=0.125 2024-08-10 05:32:45,510 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 05:33:08,991 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.87 vs. limit=22.5 2024-08-10 05:33:26,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=394780.0, ans=0.0 2024-08-10 05:33:29,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=394780.0, ans=0.0 2024-08-10 05:33:31,279 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10500, loss[loss=0.1261, beats_loss=0.01071, ecapa_loss=0.0003119, whisper_loss=0.1123, over 18846.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01228, ecapa_loss=0.0002871, whisper_loss=0.09801, over 3806226.96 frames. ], batch size: 72, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:33:31,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=394880.0, ans=0.125 2024-08-10 05:33:37,273 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 05:33:45,039 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 05:33:49,539 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.533e+01 2.971e+01 3.381e+01 3.721e+01 5.999e+01, threshold=6.761e+01, percent-clipped=0.0 2024-08-10 05:33:50,958 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-10 05:34:14,627 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.25 vs. limit=22.5 2024-08-10 05:34:35,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=395280.0, ans=0.125 2024-08-10 05:34:35,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=395280.0, ans=0.0 2024-08-10 05:34:38,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=395280.0, ans=0.125 2024-08-10 05:34:39,855 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 22 from LS+wenet, 31 from Vox, 41 fro AS 2024-08-10 05:34:46,761 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10550, loss[loss=0.08866, beats_loss=0.01429, ecapa_loss=0.0003255, whisper_loss=0.07111, over 21008.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01232, ecapa_loss=0.0002889, whisper_loss=0.09804, over 3830772.71 frames. ], batch size: 90, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:34:52,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=395380.0, ans=0.125 2024-08-10 05:34:53,916 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 05:35:02,823 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 05:35:06,175 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 05:35:14,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=395480.0, ans=0.2 2024-08-10 05:35:34,892 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 14 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-10 05:35:48,860 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 05:35:49,478 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.61 vs. limit=22.5 2024-08-10 05:35:53,363 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 17 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 05:36:02,152 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10600, loss[loss=0.09867, beats_loss=0.01233, ecapa_loss=0.0004143, whisper_loss=0.0822, over 12166.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.0124, ecapa_loss=0.0002869, whisper_loss=0.09751, over 3854975.90 frames. ], batch size: 54, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:36:04,418 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.99 vs. limit=10.0 2024-08-10 05:36:18,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=395980.0, ans=0.1 2024-08-10 05:36:19,680 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.978e+01 3.470e+01 3.932e+01 9.831e+01, threshold=6.940e+01, percent-clipped=1.0 2024-08-10 05:36:27,448 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=12.0 2024-08-10 05:36:36,681 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 05:36:37,709 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.93 vs. limit=22.5 2024-08-10 05:36:39,489 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.23 vs. limit=12.0 2024-08-10 05:36:41,934 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.22 vs. limit=15.0 2024-08-10 05:36:45,673 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=12.0 2024-08-10 05:36:49,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=396180.0, ans=0.1 2024-08-10 05:36:56,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=396180.0, ans=0.125 2024-08-10 05:37:17,807 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10650, loss[loss=0.1321, beats_loss=0.01234, ecapa_loss=0.0002937, whisper_loss=0.1168, over 22972.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01233, ecapa_loss=0.0002852, whisper_loss=0.09882, over 3854508.47 frames. ], batch size: 91, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:37:18,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=396380.0, ans=0.0 2024-08-10 05:37:22,442 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-10 05:37:29,695 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.16 vs. limit=15.0 2024-08-10 05:37:37,478 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-10 05:38:12,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=396680.0, ans=0.05 2024-08-10 05:38:26,706 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 05:38:28,932 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2024-08-10 05:38:31,047 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 22 from LS+wenet, 20 from Vox, 52 fro AS 2024-08-10 05:38:32,087 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10700, loss[loss=0.09577, beats_loss=0.01551, ecapa_loss=0.0002159, whisper_loss=0.0781, over 23321.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01233, ecapa_loss=0.0002837, whisper_loss=0.09866, over 3855282.27 frames. ], batch size: 94, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:38:32,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=396880.0, ans=0.125 2024-08-10 05:38:48,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=396980.0, ans=0.1 2024-08-10 05:38:49,702 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.532e+01 3.168e+01 3.517e+01 4.154e+01 8.442e+01, threshold=7.034e+01, percent-clipped=1.0 2024-08-10 05:38:57,938 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 05:39:16,285 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 05:39:22,639 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.843e-01 2024-08-10 05:39:28,067 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 05:39:42,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=397280.0, ans=0.0 2024-08-10 05:39:47,660 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10750, loss[loss=0.1295, beats_loss=0.0109, ecapa_loss=0.0003143, whisper_loss=0.1154, over 22628.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.0124, ecapa_loss=0.0002843, whisper_loss=0.09845, over 3869972.44 frames. ], batch size: 93, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:40:00,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=397380.0, ans=0.125 2024-08-10 05:40:04,976 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.31 vs. limit=22.5 2024-08-10 05:40:12,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=397480.0, ans=0.2 2024-08-10 05:40:18,448 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 05:40:21,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=397580.0, ans=0.2 2024-08-10 05:40:27,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=397580.0, ans=0.0 2024-08-10 05:40:42,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=397680.0, ans=0.0 2024-08-10 05:41:02,535 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10800, loss[loss=0.1422, beats_loss=0.01185, ecapa_loss=0.0002765, whisper_loss=0.1275, over 23054.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01241, ecapa_loss=0.0002829, whisper_loss=0.09907, over 3877193.70 frames. ], batch size: 92, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:41:09,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=397880.0, ans=0.1 2024-08-10 05:41:10,328 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 05:41:12,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=397880.0, ans=0.0 2024-08-10 05:41:12,446 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.74 vs. limit=15.0 2024-08-10 05:41:20,215 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.457e+01 2.911e+01 3.259e+01 3.950e+01 6.115e+01, threshold=6.518e+01, percent-clipped=0.0 2024-08-10 05:41:27,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=397980.0, ans=0.125 2024-08-10 05:41:32,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=398080.0, ans=0.07 2024-08-10 05:42:06,804 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 05:42:18,486 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10850, loss[loss=0.08497, beats_loss=0.01621, ecapa_loss=0.0002757, whisper_loss=0.066, over 21054.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01235, ecapa_loss=0.0002849, whisper_loss=0.09948, over 3910075.91 frames. ], batch size: 88, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:42:30,930 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 26 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-10 05:42:39,815 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 05:42:48,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=398580.0, ans=0.1 2024-08-10 05:42:54,598 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2024-08-10 05:42:55,184 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 05:42:57,366 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.191e+00 2024-08-10 05:43:13,719 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 05:43:31,401 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 05:43:34,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=398880.0, ans=0.2 2024-08-10 05:43:35,513 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10900, loss[loss=0.1296, beats_loss=0.01383, ecapa_loss=0.0002481, whisper_loss=0.1133, over 16958.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01241, ecapa_loss=0.0002832, whisper_loss=0.09957, over 3918438.03 frames. ], batch size: 65, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:43:40,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=398880.0, ans=0.125 2024-08-10 05:43:52,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=398980.0, ans=0.125 2024-08-10 05:43:53,650 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.407e+01 3.145e+01 3.517e+01 3.996e+01 1.577e+02, threshold=7.034e+01, percent-clipped=2.0 2024-08-10 05:44:12,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=399080.0, ans=0.125 2024-08-10 05:44:14,383 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 05:44:34,340 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.16 vs. limit=22.5 2024-08-10 05:44:39,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=399280.0, ans=0.125 2024-08-10 05:44:43,238 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 23 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-10 05:44:49,854 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 10950, loss[loss=0.1279, beats_loss=0.01047, ecapa_loss=0.0003311, whisper_loss=0.1141, over 22362.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01238, ecapa_loss=0.000286, whisper_loss=0.0998, over 3937711.85 frames. ], batch size: 89, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:45:02,494 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-10 05:45:04,736 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=15.0 2024-08-10 05:45:05,467 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 05:45:18,143 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 05:45:18,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=399580.0, ans=0.2 2024-08-10 05:45:43,033 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-10 05:45:43,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=399680.0, ans=0.0 2024-08-10 05:45:49,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=399780.0, ans=0.125 2024-08-10 05:45:52,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=399780.0, ans=0.125 2024-08-10 05:45:53,856 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 05:46:02,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=399780.0, ans=0.125 2024-08-10 05:46:05,977 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11000, loss[loss=0.1147, beats_loss=0.01343, ecapa_loss=0.0003249, whisper_loss=0.09806, over 20902.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01242, ecapa_loss=0.0002879, whisper_loss=0.09959, over 3926846.53 frames. ], batch size: 88, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:46:09,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=399880.0, ans=0.0 2024-08-10 05:46:12,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=399880.0, ans=0.1 2024-08-10 05:46:18,435 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 16 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 05:46:18,771 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 05:46:26,640 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+01 2.898e+01 3.404e+01 3.976e+01 6.521e+01, threshold=6.808e+01, percent-clipped=0.0 2024-08-10 05:46:28,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=399980.0, ans=0.1 2024-08-10 05:46:34,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=399980.0, ans=0.04949747468305833 2024-08-10 05:46:45,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=400080.0, ans=10.0 2024-08-10 05:47:21,026 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 05:47:21,968 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11050, loss[loss=0.1224, beats_loss=0.01222, ecapa_loss=0.0002893, whisper_loss=0.1073, over 23539.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01236, ecapa_loss=0.0002889, whisper_loss=0.09932, over 3946691.57 frames. ], batch size: 94, lr: 1.80e-02, grad_scale: 33554432.0 2024-08-10 05:47:22,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=400380.0, ans=0.0 2024-08-10 05:47:29,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=400380.0, ans=0.0 2024-08-10 05:47:56,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400580.0, ans=0.1 2024-08-10 05:48:06,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=400680.0, ans=0.125 2024-08-10 05:48:25,864 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 05:48:34,869 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11100, loss[loss=0.1307, beats_loss=0.01257, ecapa_loss=0.0003042, whisper_loss=0.1151, over 21634.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01233, ecapa_loss=0.0002899, whisper_loss=0.09998, over 3939985.58 frames. ], batch size: 87, lr: 1.80e-02, grad_scale: 33554432.0 2024-08-10 05:48:41,957 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.81 vs. limit=10.0 2024-08-10 05:48:43,471 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.28 vs. limit=15.0 2024-08-10 05:48:52,836 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.998e+01 3.322e+01 3.680e+01 7.626e+01, threshold=6.644e+01, percent-clipped=1.0 2024-08-10 05:49:03,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=401080.0, ans=0.125 2024-08-10 05:49:06,843 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 15 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 05:49:19,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=401180.0, ans=0.1 2024-08-10 05:49:26,438 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.75 vs. limit=15.0 2024-08-10 05:49:41,884 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.83 vs. limit=15.0 2024-08-10 05:49:42,692 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 05:49:45,408 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-10 05:49:47,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=401280.0, ans=0.125 2024-08-10 05:49:50,136 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11150, loss[loss=0.1236, beats_loss=0.009803, ecapa_loss=0.0003233, whisper_loss=0.1106, over 22291.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01227, ecapa_loss=0.0002892, whisper_loss=0.09982, over 3944726.70 frames. ], batch size: 89, lr: 1.80e-02, grad_scale: 33554432.0 2024-08-10 05:50:28,358 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 05:50:33,482 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.341e-01 2024-08-10 05:50:41,649 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.08 vs. limit=15.0 2024-08-10 05:50:43,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=401680.0, ans=0.125 2024-08-10 05:50:57,269 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2024-08-10 05:51:01,883 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11200, loss[loss=0.0899, beats_loss=0.01548, ecapa_loss=0.0003653, whisper_loss=0.07077, over 21483.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01228, ecapa_loss=0.0002881, whisper_loss=0.1, over 3932960.13 frames. ], batch size: 97, lr: 1.80e-02, grad_scale: 33554432.0 2024-08-10 05:51:04,843 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 05:51:13,642 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-10 05:51:18,847 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 3.098e+01 3.521e+01 4.109e+01 7.831e+01, threshold=7.041e+01, percent-clipped=1.0 2024-08-10 05:51:19,113 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-10 05:51:26,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=401980.0, ans=0.125 2024-08-10 05:51:35,993 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.63 vs. limit=22.5 2024-08-10 05:51:50,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=402180.0, ans=0.125 2024-08-10 05:52:08,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=402280.0, ans=0.0 2024-08-10 05:52:15,219 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 05:52:16,320 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11250, loss[loss=0.104, beats_loss=0.01233, ecapa_loss=0.0002949, whisper_loss=0.08877, over 21724.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01228, ecapa_loss=0.0002879, whisper_loss=0.09954, over 3899614.48 frames. ], batch size: 88, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:52:22,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=402380.0, ans=0.2 2024-08-10 05:52:25,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=402380.0, ans=22.5 2024-08-10 05:52:31,040 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 23 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 05:52:33,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=402480.0, ans=0.1 2024-08-10 05:53:26,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=402880.0, ans=0.0 2024-08-10 05:53:27,608 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11300, loss[loss=0.116, beats_loss=0.01097, ecapa_loss=0.0003171, whisper_loss=0.1019, over 19430.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.0122, ecapa_loss=0.0002867, whisper_loss=0.09983, over 3894273.20 frames. ], batch size: 79, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:53:27,837 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-10 05:53:44,535 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 2.975e+01 3.482e+01 3.976e+01 1.269e+02, threshold=6.963e+01, percent-clipped=1.0 2024-08-10 05:53:45,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=402980.0, ans=0.125 2024-08-10 05:53:50,505 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-10 05:54:02,957 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-10 05:54:04,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=403080.0, ans=0.125 2024-08-10 05:54:07,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=403080.0, ans=0.2 2024-08-10 05:54:17,020 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 05:54:20,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=403180.0, ans=0.1 2024-08-10 05:54:21,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=403180.0, ans=0.125 2024-08-10 05:54:22,942 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 05:54:28,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.07 vs. limit=22.5 2024-08-10 05:54:34,030 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-08-10 05:54:39,539 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11350, loss[loss=0.1181, beats_loss=0.01409, ecapa_loss=0.0002676, whisper_loss=0.1014, over 22427.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01216, ecapa_loss=0.0002863, whisper_loss=0.1003, over 3906956.05 frames. ], batch size: 91, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:55:01,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=403480.0, ans=0.0 2024-08-10 05:55:20,204 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=26.03 vs. limit=22.5 2024-08-10 05:55:23,971 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 05:55:26,521 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.83 vs. limit=15.0 2024-08-10 05:55:31,796 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 05:55:34,473 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-10 05:55:37,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=403680.0, ans=0.0 2024-08-10 05:55:55,423 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11400, loss[loss=0.0988, beats_loss=0.01249, ecapa_loss=0.0002865, whisper_loss=0.08344, over 13817.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.0122, ecapa_loss=0.0002864, whisper_loss=0.1009, over 3938474.52 frames. ], batch size: 55, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:56:13,379 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.448e+01 3.091e+01 3.465e+01 3.981e+01 8.996e+01, threshold=6.931e+01, percent-clipped=1.0 2024-08-10 05:56:15,012 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 05:56:22,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=403980.0, ans=0.125 2024-08-10 05:56:28,314 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 05:56:36,522 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.52 vs. limit=15.0 2024-08-10 05:56:38,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=404180.0, ans=0.125 2024-08-10 05:56:55,875 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 05:57:08,983 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11450, loss[loss=0.1048, beats_loss=0.01397, ecapa_loss=0.0002696, whisper_loss=0.08818, over 13964.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01224, ecapa_loss=0.000287, whisper_loss=0.1006, over 3931098.14 frames. ], batch size: 54, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:57:15,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=404380.0, ans=0.1 2024-08-10 05:57:49,257 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2024-08-10 05:57:52,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=404580.0, ans=22.5 2024-08-10 05:58:01,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=404680.0, ans=0.1 2024-08-10 05:58:04,347 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2024-08-10 05:58:12,495 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2024-08-10 05:58:24,779 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11500, loss[loss=0.1254, beats_loss=0.01515, ecapa_loss=0.0001904, whisper_loss=0.1083, over 17224.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01221, ecapa_loss=0.0002862, whisper_loss=0.1006, over 3900230.14 frames. ], batch size: 62, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:58:27,830 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 05:58:39,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=404980.0, ans=0.125 2024-08-10 05:58:42,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=404980.0, ans=0.125 2024-08-10 05:58:42,810 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.645e+01 3.195e+01 3.620e+01 4.078e+01 2.789e+02, threshold=7.241e+01, percent-clipped=1.0 2024-08-10 05:58:50,430 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-08-10 05:58:51,878 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.45 vs. limit=15.0 2024-08-10 05:59:15,292 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 05:59:24,373 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.92 vs. limit=15.0 2024-08-10 05:59:25,544 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.895e+00 2024-08-10 05:59:34,137 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 05:59:34,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=405280.0, ans=0.1 2024-08-10 05:59:38,215 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11550, loss[loss=0.1097, beats_loss=0.01376, ecapa_loss=0.0003217, whisper_loss=0.09272, over 22327.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01221, ecapa_loss=0.000287, whisper_loss=0.1007, over 3904045.27 frames. ], batch size: 92, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:59:41,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=405380.0, ans=0.05 2024-08-10 05:59:42,800 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 05:59:49,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=405380.0, ans=0.5 2024-08-10 05:59:53,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=405480.0, ans=0.0 2024-08-10 06:00:06,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=405480.0, ans=0.1 2024-08-10 06:00:06,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=405480.0, ans=0.2 2024-08-10 06:00:07,153 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 06:00:21,122 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-10 06:00:23,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=405680.0, ans=0.125 2024-08-10 06:00:24,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=405680.0, ans=0.125 2024-08-10 06:00:26,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=405680.0, ans=0.125 2024-08-10 06:00:27,865 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 19 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 06:00:38,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=405780.0, ans=0.125 2024-08-10 06:00:46,920 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 06:00:54,262 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11600, loss[loss=0.1361, beats_loss=0.01108, ecapa_loss=0.0002859, whisper_loss=0.1222, over 21179.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01216, ecapa_loss=0.0002865, whisper_loss=0.1009, over 3927602.01 frames. ], batch size: 82, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 06:01:04,717 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 06:01:11,932 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.511e+01 3.361e+01 3.673e+01 4.425e+01 6.331e+01, threshold=7.346e+01, percent-clipped=0.0 2024-08-10 06:01:24,438 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.022e-01 2024-08-10 06:01:25,548 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 06:01:32,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=406080.0, ans=0.0 2024-08-10 06:01:36,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=406180.0, ans=0.0 2024-08-10 06:01:50,912 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-10 06:02:02,352 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 06:02:06,714 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11650, loss[loss=0.1079, beats_loss=0.01339, ecapa_loss=0.0002233, whisper_loss=0.0923, over 23524.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01228, ecapa_loss=0.0002873, whisper_loss=0.09951, over 3918010.15 frames. ], batch size: 92, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 06:02:29,197 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-10 06:02:33,713 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-10 06:02:36,237 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 06:02:39,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=406580.0, ans=0.0 2024-08-10 06:03:15,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=406880.0, ans=0.0 2024-08-10 06:03:16,727 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11700, loss[loss=0.1037, beats_loss=0.01497, ecapa_loss=0.0002734, whisper_loss=0.08601, over 21531.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01243, ecapa_loss=0.0002852, whisper_loss=0.09919, over 3935849.89 frames. ], batch size: 89, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 06:03:21,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=406880.0, ans=0.125 2024-08-10 06:03:28,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=406880.0, ans=0.0 2024-08-10 06:03:31,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=406980.0, ans=0.2 2024-08-10 06:03:33,631 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.451e+01 3.237e+01 3.576e+01 4.266e+01 6.520e+01, threshold=7.151e+01, percent-clipped=0.0 2024-08-10 06:03:36,629 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 06:03:36,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=406980.0, ans=0.125 2024-08-10 06:03:42,489 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 06:04:02,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=407180.0, ans=0.2 2024-08-10 06:04:15,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=407280.0, ans=0.125 2024-08-10 06:04:24,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=407280.0, ans=0.025 2024-08-10 06:04:28,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=407380.0, ans=0.0 2024-08-10 06:04:28,998 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11750, loss[loss=0.09926, beats_loss=0.01509, ecapa_loss=0.0002665, whisper_loss=0.08151, over 22099.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01248, ecapa_loss=0.000284, whisper_loss=0.09876, over 3941355.22 frames. ], batch size: 90, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:04:31,338 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.61 vs. limit=15.0 2024-08-10 06:04:42,255 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 06:04:43,806 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 16 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 06:04:46,006 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2024-08-10 06:04:51,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=407480.0, ans=0.1 2024-08-10 06:05:05,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=407580.0, ans=0.0 2024-08-10 06:05:20,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=407680.0, ans=0.09899494936611666 2024-08-10 06:05:21,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=407680.0, ans=0.5 2024-08-10 06:05:23,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=407680.0, ans=0.09899494936611666 2024-08-10 06:05:43,079 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11800, loss[loss=0.1085, beats_loss=0.01181, ecapa_loss=0.0003198, whisper_loss=0.0935, over 20585.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01245, ecapa_loss=0.000286, whisper_loss=0.09852, over 3956610.76 frames. ], batch size: 84, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:05:51,416 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-10 06:05:59,307 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.497e+01 3.074e+01 3.455e+01 3.897e+01 7.543e+01, threshold=6.910e+01, percent-clipped=1.0 2024-08-10 06:06:11,059 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=15.0 2024-08-10 06:06:22,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=408080.0, ans=0.0 2024-08-10 06:06:32,345 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 06:06:32,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=408180.0, ans=0.0 2024-08-10 06:06:42,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=408280.0, ans=0.1 2024-08-10 06:06:45,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=408280.0, ans=0.1 2024-08-10 06:06:53,663 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11850, loss[loss=0.1191, beats_loss=0.00879, ecapa_loss=0.0004284, whisper_loss=0.106, over 14029.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01243, ecapa_loss=0.0002884, whisper_loss=0.09891, over 3948651.10 frames. ], batch size: 61, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:06:58,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=408380.0, ans=0.125 2024-08-10 06:06:58,985 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2024-08-10 06:07:10,460 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 33 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 06:07:31,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=408580.0, ans=0.09899494936611666 2024-08-10 06:07:42,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=408680.0, ans=0.125 2024-08-10 06:07:51,666 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.090e-02 2024-08-10 06:08:01,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=408780.0, ans=0.125 2024-08-10 06:08:03,991 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11900, loss[loss=0.1001, beats_loss=0.01365, ecapa_loss=0.0002607, whisper_loss=0.08387, over 22514.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01248, ecapa_loss=0.0002886, whisper_loss=0.09805, over 3954789.32 frames. ], batch size: 92, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:08:14,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2024-08-10 06:08:15,234 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 06:08:16,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=408980.0, ans=0.125 2024-08-10 06:08:18,045 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 37 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 06:08:20,520 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.628e+01 3.266e+01 3.553e+01 4.247e+01 1.215e+02, threshold=7.106e+01, percent-clipped=1.0 2024-08-10 06:08:21,702 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.35 vs. limit=10.0 2024-08-10 06:08:24,798 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 06:08:33,136 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 06:08:37,861 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=27.04 vs. limit=22.5 2024-08-10 06:08:40,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=409080.0, ans=0.125 2024-08-10 06:08:43,663 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.56 vs. limit=10.0 2024-08-10 06:08:44,231 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-10 06:08:45,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=409180.0, ans=0.2 2024-08-10 06:08:52,886 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 06:08:59,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=409280.0, ans=0.1 2024-08-10 06:08:59,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=409280.0, ans=0.125 2024-08-10 06:09:00,995 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 06:09:01,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=409280.0, ans=0.09899494936611666 2024-08-10 06:09:13,145 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 11950, loss[loss=0.1261, beats_loss=0.01109, ecapa_loss=0.0002693, whisper_loss=0.1123, over 22609.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01237, ecapa_loss=0.0002877, whisper_loss=0.09893, over 3927530.58 frames. ], batch size: 90, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:09:15,628 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.37 vs. limit=15.0 2024-08-10 06:09:26,920 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=15.0 2024-08-10 06:09:31,743 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-10 06:10:01,065 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 06:10:11,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=409780.0, ans=0.125 2024-08-10 06:10:16,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=409780.0, ans=0.05 2024-08-10 06:10:16,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=409780.0, ans=0.0 2024-08-10 06:10:22,762 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12000, loss[loss=0.09091, beats_loss=0.01438, ecapa_loss=0.0002359, whisper_loss=0.07418, over 14783.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01228, ecapa_loss=0.0002901, whisper_loss=0.09927, over 3921650.86 frames. ], batch size: 57, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:10:22,762 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-10 06:10:59,522 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.9624, 2.8350, 2.7740, 3.0960], device='cuda:2') 2024-08-10 06:11:01,933 INFO [train_multi_KD3.py:1149] (2/4) Epoch 3, validation on ASR_libri: loss=0.2695, beats_loss=0, ecapa_loss=0.000863, whisper_loss=0.2608, over 922467.00 frames. 2024-08-10 06:11:17,767 INFO [train_multi_KD3.py:1149] (2/4) Epoch 3, validation on SV_voxceleb1: loss=0.007635, beats_loss=0, ecapa_loss=0.0007635, whisper_loss=0, over 939242.00 frames. 2024-08-10 06:11:58,775 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.0549, 1.8882, 1.5770, 1.7611], device='cuda:2') 2024-08-10 06:13:11,109 INFO [train_multi_KD3.py:1149] (2/4) Epoch 3, validation on AT_audioset: loss=0.0284, beats_loss=0.0284, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 06:13:11,113 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 06:13:11,379 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 06:13:15,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=409880.0, ans=0.0 2024-08-10 06:13:21,368 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 06:13:23,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=409880.0, ans=0.1 2024-08-10 06:13:28,234 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 3.155e+01 3.494e+01 4.116e+01 7.765e+01, threshold=6.989e+01, percent-clipped=1.0 2024-08-10 06:14:00,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=410180.0, ans=0.2 2024-08-10 06:14:23,362 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12050, loss[loss=0.1367, beats_loss=0.01001, ecapa_loss=0.0002945, whisper_loss=0.1237, over 22932.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01219, ecapa_loss=0.0002912, whisper_loss=0.09927, over 3889396.95 frames. ], batch size: 93, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:14:23,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=410380.0, ans=0.125 2024-08-10 06:14:42,626 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 06:15:05,407 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 06:15:09,350 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 06:15:33,092 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12100, loss[loss=0.09355, beats_loss=0.01116, ecapa_loss=0.0003547, whisper_loss=0.07883, over 13781.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01226, ecapa_loss=0.0002905, whisper_loss=0.09902, over 3911380.15 frames. ], batch size: 58, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:15:44,173 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 06:15:49,069 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.51 vs. limit=15.0 2024-08-10 06:15:49,336 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.418e+01 3.160e+01 3.535e+01 4.240e+01 9.123e+01, threshold=7.071e+01, percent-clipped=3.0 2024-08-10 06:15:49,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=410980.0, ans=0.0 2024-08-10 06:15:55,284 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-10 06:16:21,906 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.18 vs. limit=22.5 2024-08-10 06:16:41,660 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12150, loss[loss=0.1029, beats_loss=0.01132, ecapa_loss=0.0002903, whisper_loss=0.08872, over 18364.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01221, ecapa_loss=0.0002906, whisper_loss=0.09882, over 3884481.03 frames. ], batch size: 75, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:16:57,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=411480.0, ans=0.125 2024-08-10 06:17:07,179 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=22.08 vs. limit=15.0 2024-08-10 06:17:09,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=411580.0, ans=0.125 2024-08-10 06:17:13,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=411580.0, ans=0.125 2024-08-10 06:17:17,819 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-10 06:17:31,824 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 31 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 06:17:37,501 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 06:17:38,910 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 06:17:50,797 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12200, loss[loss=0.1221, beats_loss=0.01144, ecapa_loss=0.0003241, whisper_loss=0.1074, over 13393.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01229, ecapa_loss=0.0002879, whisper_loss=0.09835, over 3880444.33 frames. ], batch size: 57, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:18:07,963 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.664e+01 3.192e+01 3.663e+01 4.187e+01 6.724e+01, threshold=7.326e+01, percent-clipped=0.0 2024-08-10 06:18:23,304 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 06:18:25,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=412080.0, ans=0.0 2024-08-10 06:18:46,375 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 31 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 06:18:46,933 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.35 vs. limit=15.0 2024-08-10 06:19:03,415 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12250, loss[loss=0.1134, beats_loss=0.01263, ecapa_loss=0.0002483, whisper_loss=0.09829, over 21085.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01223, ecapa_loss=0.0002904, whisper_loss=0.09877, over 3907725.77 frames. ], batch size: 83, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:19:16,084 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 38 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 06:19:17,793 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=16.10 vs. limit=15.0 2024-08-10 06:19:19,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=412480.0, ans=0.0 2024-08-10 06:19:23,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=412480.0, ans=0.0 2024-08-10 06:19:24,528 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 06:19:46,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=412680.0, ans=0.125 2024-08-10 06:19:48,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=412680.0, ans=0.125 2024-08-10 06:20:12,513 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12300, loss[loss=0.1211, beats_loss=0.01044, ecapa_loss=0.0002217, whisper_loss=0.1085, over 17023.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01224, ecapa_loss=0.0002893, whisper_loss=0.09847, over 3909255.65 frames. ], batch size: 61, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:20:28,720 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.503e+01 3.356e+01 3.807e+01 4.575e+01 1.219e+02, threshold=7.614e+01, percent-clipped=2.0 2024-08-10 06:21:19,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=413280.0, ans=0.0 2024-08-10 06:21:21,634 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12350, loss[loss=0.1025, beats_loss=0.01066, ecapa_loss=0.0003044, whisper_loss=0.08881, over 18771.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01222, ecapa_loss=0.0002889, whisper_loss=0.09883, over 3926873.11 frames. ], batch size: 75, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:21:55,198 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2024-08-10 06:21:55,959 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 06:22:05,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.71 vs. limit=15.0 2024-08-10 06:22:12,272 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 06:22:30,409 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12400, loss[loss=0.1326, beats_loss=0.01167, ecapa_loss=0.0002929, whisper_loss=0.118, over 22761.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01221, ecapa_loss=0.0002863, whisper_loss=0.09843, over 3934937.17 frames. ], batch size: 91, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:22:32,516 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.46 vs. limit=12.0 2024-08-10 06:22:45,887 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 06:22:47,296 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.307e+01 3.123e+01 3.503e+01 4.019e+01 1.294e+02, threshold=7.007e+01, percent-clipped=1.0 2024-08-10 06:22:51,566 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 17 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-10 06:22:55,416 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 06:23:13,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=414180.0, ans=0.0 2024-08-10 06:23:14,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=414180.0, ans=0.125 2024-08-10 06:23:29,061 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.915e-01 2024-08-10 06:23:39,481 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12450, loss[loss=0.122, beats_loss=0.01226, ecapa_loss=0.0002672, whisper_loss=0.1071, over 14197.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01224, ecapa_loss=0.0002872, whisper_loss=0.09828, over 3910638.03 frames. ], batch size: 55, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:23:43,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=414380.0, ans=0.0 2024-08-10 06:23:59,110 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 06:24:23,318 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 06:24:25,946 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-10 06:24:26,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=414680.0, ans=0.0 2024-08-10 06:24:30,050 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 06:24:49,291 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12500, loss[loss=0.1398, beats_loss=0.01138, ecapa_loss=0.000261, whisper_loss=0.1259, over 22742.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01231, ecapa_loss=0.0002863, whisper_loss=0.09782, over 3899912.34 frames. ], batch size: 90, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:24:49,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=414880.0, ans=0.125 2024-08-10 06:25:06,186 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.398e+01 3.279e+01 3.697e+01 4.212e+01 5.815e+01, threshold=7.393e+01, percent-clipped=0.0 2024-08-10 06:25:27,752 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-10 06:25:48,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=415280.0, ans=0.0 2024-08-10 06:25:58,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=415280.0, ans=0.015 2024-08-10 06:26:00,632 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=12.0 2024-08-10 06:26:01,450 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12550, loss[loss=0.1203, beats_loss=0.01156, ecapa_loss=0.0002579, whisper_loss=0.1062, over 19278.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01229, ecapa_loss=0.0002865, whisper_loss=0.09836, over 3910016.57 frames. ], batch size: 74, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:26:03,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=415380.0, ans=0.2 2024-08-10 06:26:03,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=415380.0, ans=0.0 2024-08-10 06:26:15,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=415380.0, ans=0.125 2024-08-10 06:26:16,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=415480.0, ans=0.125 2024-08-10 06:26:22,851 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 06:26:25,654 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 06:26:28,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=415480.0, ans=0.2 2024-08-10 06:26:34,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=415580.0, ans=0.2 2024-08-10 06:26:39,619 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 32 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 06:26:40,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=415580.0, ans=0.125 2024-08-10 06:26:45,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=415680.0, ans=0.125 2024-08-10 06:26:51,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=415680.0, ans=0.1 2024-08-10 06:26:57,928 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 06:27:00,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=415780.0, ans=0.05 2024-08-10 06:27:09,030 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 06:27:09,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=415780.0, ans=0.2 2024-08-10 06:27:09,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2024-08-10 06:27:14,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=415880.0, ans=0.1 2024-08-10 06:27:14,726 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12600, loss[loss=0.08799, beats_loss=0.00871, ecapa_loss=0.0003112, whisper_loss=0.07617, over 15582.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01221, ecapa_loss=0.0002882, whisper_loss=0.099, over 3924050.34 frames. ], batch size: 60, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:27:18,305 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 40 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 06:27:22,049 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 30 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 06:27:29,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=415880.0, ans=0.0 2024-08-10 06:27:34,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=415980.0, ans=0.125 2024-08-10 06:27:35,227 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.630e+01 3.161e+01 3.514e+01 4.071e+01 6.890e+01, threshold=7.028e+01, percent-clipped=0.0 2024-08-10 06:27:49,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=416080.0, ans=0.125 2024-08-10 06:27:50,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=416080.0, ans=0.125 2024-08-10 06:28:08,988 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2024-08-10 06:28:15,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=416180.0, ans=0.125 2024-08-10 06:28:17,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=416180.0, ans=0.1 2024-08-10 06:28:38,360 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12650, loss[loss=0.0934, beats_loss=0.01367, ecapa_loss=0.0003495, whisper_loss=0.07623, over 19835.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01228, ecapa_loss=0.0002897, whisper_loss=0.09863, over 3924764.03 frames. ], batch size: 84, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:28:45,713 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 06:28:45,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=416380.0, ans=0.125 2024-08-10 06:28:50,481 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 06:29:05,674 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2024-08-10 06:29:30,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=416680.0, ans=0.125 2024-08-10 06:29:31,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=416680.0, ans=0.0 2024-08-10 06:29:35,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=416680.0, ans=0.07 2024-08-10 06:29:47,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=416780.0, ans=10.0 2024-08-10 06:29:47,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=416780.0, ans=0.1 2024-08-10 06:30:00,167 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12700, loss[loss=0.1381, beats_loss=0.009286, ecapa_loss=0.0002814, whisper_loss=0.126, over 21393.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01222, ecapa_loss=0.000291, whisper_loss=0.09828, over 3926755.19 frames. ], batch size: 81, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:30:12,854 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.07 vs. limit=6.0 2024-08-10 06:30:23,226 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 3.052e+01 3.376e+01 3.987e+01 6.626e+01, threshold=6.752e+01, percent-clipped=0.0 2024-08-10 06:30:37,625 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-08-10 06:30:37,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.37 vs. limit=15.0 2024-08-10 06:30:39,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=417080.0, ans=0.2 2024-08-10 06:30:50,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2024-08-10 06:30:57,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=417080.0, ans=0.125 2024-08-10 06:30:59,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=417180.0, ans=0.0 2024-08-10 06:31:01,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=417180.0, ans=0.1 2024-08-10 06:31:08,407 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 06:31:23,653 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.81 vs. limit=15.0 2024-08-10 06:31:27,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=417280.0, ans=0.2 2024-08-10 06:31:40,447 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12750, loss[loss=0.1233, beats_loss=0.01028, ecapa_loss=0.0002833, whisper_loss=0.1102, over 19756.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01226, ecapa_loss=0.000292, whisper_loss=0.09823, over 3904688.58 frames. ], batch size: 78, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:32:02,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=417480.0, ans=0.0 2024-08-10 06:32:02,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=417480.0, ans=0.2 2024-08-10 06:32:26,136 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-10 06:32:52,592 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.70 vs. limit=22.5 2024-08-10 06:32:54,724 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 06:32:56,657 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-10 06:32:58,044 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 06:32:59,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=417780.0, ans=0.125 2024-08-10 06:33:20,321 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12800, loss[loss=0.1186, beats_loss=0.01176, ecapa_loss=0.0003501, whisper_loss=0.1033, over 22135.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.0124, ecapa_loss=0.0002922, whisper_loss=0.09778, over 3919733.51 frames. ], batch size: 93, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:33:34,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=417880.0, ans=0.125 2024-08-10 06:33:39,916 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 06:33:42,789 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.343e+01 3.114e+01 3.592e+01 4.168e+01 8.043e+01, threshold=7.184e+01, percent-clipped=1.0 2024-08-10 06:34:16,733 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.13 vs. limit=15.0 2024-08-10 06:34:37,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=418180.0, ans=0.125 2024-08-10 06:34:59,450 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12850, loss[loss=0.1093, beats_loss=0.007237, ecapa_loss=0.0003872, whisper_loss=0.09814, over 14310.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01233, ecapa_loss=0.0002922, whisper_loss=0.09799, over 3912348.71 frames. ], batch size: 55, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:35:04,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=418380.0, ans=0.125 2024-08-10 06:35:39,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=418580.0, ans=0.0 2024-08-10 06:35:41,339 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2024-08-10 06:35:49,079 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 06:35:51,896 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-10 06:36:06,492 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.57 vs. limit=22.5 2024-08-10 06:36:09,819 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12900, loss[loss=0.1154, beats_loss=0.01174, ecapa_loss=0.0002761, whisper_loss=0.1009, over 16640.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01234, ecapa_loss=0.0002924, whisper_loss=0.09813, over 3896693.46 frames. ], batch size: 63, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:36:10,046 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-10 06:36:13,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=418880.0, ans=0.125 2024-08-10 06:36:24,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=418980.0, ans=0.0 2024-08-10 06:36:26,225 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.308e+01 3.152e+01 3.621e+01 4.177e+01 6.125e+01, threshold=7.242e+01, percent-clipped=0.0 2024-08-10 06:36:31,244 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 06:36:57,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=419180.0, ans=0.0 2024-08-10 06:37:06,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=419280.0, ans=0.2 2024-08-10 06:37:10,202 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 06:37:10,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2024-08-10 06:37:17,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=419280.0, ans=0.0 2024-08-10 06:37:19,395 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 12950, loss[loss=0.1089, beats_loss=0.01115, ecapa_loss=0.0002853, whisper_loss=0.09492, over 23465.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01228, ecapa_loss=0.0002905, whisper_loss=0.0978, over 3925368.40 frames. ], batch size: 92, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:37:25,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=419380.0, ans=0.125 2024-08-10 06:37:25,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=419380.0, ans=0.125 2024-08-10 06:37:25,459 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.986e+05 2024-08-10 06:37:40,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=419480.0, ans=0.125 2024-08-10 06:37:58,046 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 37 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 06:38:06,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=419680.0, ans=0.0 2024-08-10 06:38:08,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=419680.0, ans=0.0 2024-08-10 06:38:28,362 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13000, loss[loss=0.09377, beats_loss=0.01495, ecapa_loss=0.0002944, whisper_loss=0.07588, over 22793.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01234, ecapa_loss=0.0002931, whisper_loss=0.09821, over 3928498.71 frames. ], batch size: 96, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:38:31,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=419880.0, ans=0.125 2024-08-10 06:38:45,812 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 3.317e+01 3.869e+01 4.527e+01 7.040e+01, threshold=7.738e+01, percent-clipped=0.0 2024-08-10 06:38:58,796 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.73 vs. limit=15.0 2024-08-10 06:38:59,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420080.0, ans=0.1 2024-08-10 06:39:17,904 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-08-10 06:39:23,129 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-10 06:39:42,145 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13050, loss[loss=0.1322, beats_loss=0.0112, ecapa_loss=0.0002694, whisper_loss=0.1183, over 20989.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01234, ecapa_loss=0.0002914, whisper_loss=0.09792, over 3874896.19 frames. ], batch size: 77, lr: 1.76e-02, grad_scale: 67108864.0 2024-08-10 06:40:08,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420480.0, ans=0.1 2024-08-10 06:40:15,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=420580.0, ans=0.125 2024-08-10 06:40:16,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=420580.0, ans=0.1 2024-08-10 06:40:17,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=420580.0, ans=0.04949747468305833 2024-08-10 06:40:21,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=420580.0, ans=0.0 2024-08-10 06:40:33,365 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 06:40:43,606 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 06:40:56,592 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13100, loss[loss=0.1131, beats_loss=0.01244, ecapa_loss=0.0002833, whisper_loss=0.09778, over 20005.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01225, ecapa_loss=0.0002914, whisper_loss=0.09834, over 3870916.53 frames. ], batch size: 82, lr: 1.76e-02, grad_scale: 67108864.0 2024-08-10 06:41:02,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=420880.0, ans=0.2 2024-08-10 06:41:03,188 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 06:41:14,672 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.548e+01 3.107e+01 3.501e+01 3.954e+01 7.732e+01, threshold=7.002e+01, percent-clipped=0.0 2024-08-10 06:41:15,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420980.0, ans=0.1 2024-08-10 06:41:25,336 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.54 vs. limit=15.0 2024-08-10 06:41:32,705 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.17 vs. limit=15.0 2024-08-10 06:41:35,484 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-10 06:41:38,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=421080.0, ans=0.0 2024-08-10 06:41:56,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=421280.0, ans=0.125 2024-08-10 06:42:12,468 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13150, loss[loss=0.1322, beats_loss=0.01077, ecapa_loss=0.0003049, whisper_loss=0.1184, over 19328.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.0123, ecapa_loss=0.0002895, whisper_loss=0.09826, over 3887870.36 frames. ], batch size: 73, lr: 1.76e-02, grad_scale: 67108864.0 2024-08-10 06:42:33,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=421480.0, ans=0.2 2024-08-10 06:42:42,642 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2024-08-10 06:42:46,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=421580.0, ans=0.09899494936611666 2024-08-10 06:42:59,051 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 06:43:02,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=421680.0, ans=0.035 2024-08-10 06:43:22,958 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 06:43:25,702 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13200, loss[loss=0.115, beats_loss=0.01214, ecapa_loss=0.0003002, whisper_loss=0.09989, over 22996.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01224, ecapa_loss=0.0002918, whisper_loss=0.09829, over 3873439.73 frames. ], batch size: 94, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:43:42,925 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.426e+01 3.092e+01 3.479e+01 4.168e+01 6.203e+01, threshold=6.958e+01, percent-clipped=0.0 2024-08-10 06:43:53,146 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.52 vs. limit=6.0 2024-08-10 06:43:56,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=422080.0, ans=0.125 2024-08-10 06:44:02,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=422080.0, ans=0.125 2024-08-10 06:44:11,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=422180.0, ans=0.125 2024-08-10 06:44:12,514 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 06:44:41,687 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13250, loss[loss=0.1409, beats_loss=0.009387, ecapa_loss=0.0003009, whisper_loss=0.1285, over 17146.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01223, ecapa_loss=0.0002931, whisper_loss=0.09901, over 3870727.13 frames. ], batch size: 63, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:44:45,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=422380.0, ans=0.125 2024-08-10 06:45:02,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=422480.0, ans=0.025 2024-08-10 06:45:14,969 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 06:45:22,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=422580.0, ans=0.1 2024-08-10 06:45:52,509 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.65 vs. limit=10.0 2024-08-10 06:45:54,971 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 06:45:57,462 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13300, loss[loss=0.09654, beats_loss=0.01392, ecapa_loss=0.00031, whisper_loss=0.07952, over 18224.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01229, ecapa_loss=0.0002903, whisper_loss=0.0989, over 3897484.12 frames. ], batch size: 77, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:46:02,438 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2024-08-10 06:46:06,882 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-10 06:46:15,275 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.602e+01 3.388e+01 3.671e+01 4.200e+01 6.497e+01, threshold=7.342e+01, percent-clipped=0.0 2024-08-10 06:46:22,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=422980.0, ans=0.1 2024-08-10 06:46:27,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=423080.0, ans=0.0 2024-08-10 06:46:32,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=423080.0, ans=0.2 2024-08-10 06:46:33,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=423080.0, ans=0.1 2024-08-10 06:46:49,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=423180.0, ans=0.1 2024-08-10 06:46:51,777 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 27 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 06:47:09,431 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 40 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 06:47:10,668 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13350, loss[loss=0.1493, beats_loss=0.009027, ecapa_loss=0.0003278, whisper_loss=0.137, over 23771.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01228, ecapa_loss=0.0002898, whisper_loss=0.09928, over 3885173.52 frames. ], batch size: 92, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:47:22,210 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-10 06:47:32,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=423480.0, ans=22.5 2024-08-10 06:47:46,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=423580.0, ans=0.0 2024-08-10 06:48:05,933 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 06:48:13,326 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 13 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 06:48:20,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=423780.0, ans=0.1 2024-08-10 06:48:24,501 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13400, loss[loss=0.1245, beats_loss=0.01127, ecapa_loss=0.0003641, whisper_loss=0.1096, over 21819.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01229, ecapa_loss=0.000291, whisper_loss=0.09888, over 3858317.14 frames. ], batch size: 94, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:48:30,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=423880.0, ans=0.0 2024-08-10 06:48:37,241 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.43 vs. limit=15.0 2024-08-10 06:48:38,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=423980.0, ans=0.1 2024-08-10 06:48:42,099 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.346e+01 3.269e+01 3.722e+01 4.193e+01 5.690e+01, threshold=7.444e+01, percent-clipped=0.0 2024-08-10 06:48:46,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=423980.0, ans=0.125 2024-08-10 06:49:09,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=424180.0, ans=0.1 2024-08-10 06:49:11,683 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 12 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-10 06:49:17,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=424180.0, ans=0.125 2024-08-10 06:49:19,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=424180.0, ans=0.1 2024-08-10 06:49:28,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=424280.0, ans=0.0 2024-08-10 06:49:28,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=424280.0, ans=0.125 2024-08-10 06:49:33,257 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 36 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-10 06:49:38,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13450, loss[loss=0.0871, beats_loss=0.01524, ecapa_loss=0.0003291, whisper_loss=0.06857, over 17366.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01226, ecapa_loss=0.0002917, whisper_loss=0.09872, over 3861179.44 frames. ], batch size: 74, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:49:55,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=424480.0, ans=0.1 2024-08-10 06:50:37,779 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 06:50:49,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=424880.0, ans=0.0 2024-08-10 06:50:50,379 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13500, loss[loss=0.1044, beats_loss=0.01281, ecapa_loss=0.0002448, whisper_loss=0.08917, over 22848.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01224, ecapa_loss=0.0002916, whisper_loss=0.09892, over 3864124.86 frames. ], batch size: 93, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:51:01,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=424880.0, ans=0.2 2024-08-10 06:51:07,540 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.618e+01 3.316e+01 3.785e+01 4.530e+01 1.081e+02, threshold=7.570e+01, percent-clipped=1.0 2024-08-10 06:51:25,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=425080.0, ans=0.125 2024-08-10 06:51:28,334 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-10 06:51:31,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=425080.0, ans=0.1 2024-08-10 06:51:34,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=425180.0, ans=0.125 2024-08-10 06:51:41,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=425180.0, ans=0.0 2024-08-10 06:51:58,162 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 06:52:01,919 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13550, loss[loss=0.09781, beats_loss=0.0153, ecapa_loss=0.0002211, whisper_loss=0.0803, over 16162.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01232, ecapa_loss=0.0002913, whisper_loss=0.09872, over 3856135.32 frames. ], batch size: 66, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:52:02,756 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2024-08-10 06:52:19,979 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2024-08-10 06:52:29,004 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 06:52:48,532 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 06:53:05,461 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2024-08-10 06:53:09,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=425780.0, ans=0.5 2024-08-10 06:53:13,375 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13600, loss[loss=0.1119, beats_loss=0.01424, ecapa_loss=0.0002894, whisper_loss=0.09481, over 23276.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.01238, ecapa_loss=0.0002906, whisper_loss=0.0976, over 3848411.48 frames. ], batch size: 95, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:53:30,891 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.224e+01 3.163e+01 3.442e+01 4.144e+01 6.667e+01, threshold=6.884e+01, percent-clipped=0.0 2024-08-10 06:53:32,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=425980.0, ans=0.125 2024-08-10 06:53:34,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=425980.0, ans=0.125 2024-08-10 06:53:54,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=426080.0, ans=0.125 2024-08-10 06:53:58,588 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.82 vs. limit=15.0 2024-08-10 06:53:59,998 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=1.96 vs. limit=15.0 2024-08-10 06:54:02,557 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 06:54:19,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=426280.0, ans=0.5 2024-08-10 06:54:24,338 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13650, loss[loss=0.1244, beats_loss=0.01024, ecapa_loss=0.000348, whisper_loss=0.1107, over 14092.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.01233, ecapa_loss=0.0002922, whisper_loss=0.09768, over 3861732.62 frames. ], batch size: 55, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:54:24,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=426380.0, ans=0.125 2024-08-10 06:54:29,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=426380.0, ans=0.1 2024-08-10 06:54:30,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=426380.0, ans=0.0 2024-08-10 06:54:36,195 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.53 vs. limit=15.0 2024-08-10 06:54:38,105 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 06:54:45,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=426480.0, ans=0.0 2024-08-10 06:54:47,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=426480.0, ans=0.1 2024-08-10 06:55:26,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=22.5 2024-08-10 06:55:33,195 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13700, loss[loss=0.1139, beats_loss=0.01326, ecapa_loss=0.0003381, whisper_loss=0.09727, over 19863.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01235, ecapa_loss=0.0002917, whisper_loss=0.09804, over 3851049.88 frames. ], batch size: 81, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 06:55:49,040 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.603e+01 3.221e+01 3.630e+01 4.052e+01 7.780e+01, threshold=7.261e+01, percent-clipped=2.0 2024-08-10 06:55:50,417 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 06:55:52,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=426980.0, ans=0.125 2024-08-10 06:55:53,268 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-10 06:55:59,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=427080.0, ans=0.125 2024-08-10 06:56:00,116 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.63 vs. limit=12.0 2024-08-10 06:56:09,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=427080.0, ans=0.1 2024-08-10 06:56:28,277 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.99 vs. limit=15.0 2024-08-10 06:56:43,270 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13750, loss[loss=0.1172, beats_loss=0.01222, ecapa_loss=0.0003019, whisper_loss=0.102, over 22920.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01232, ecapa_loss=0.0002917, whisper_loss=0.09824, over 3856772.94 frames. ], batch size: 93, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 06:56:49,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=427380.0, ans=10.0 2024-08-10 06:56:52,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=427380.0, ans=0.1 2024-08-10 06:57:00,540 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-10 06:57:02,227 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-10 06:57:20,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=427580.0, ans=0.0 2024-08-10 06:57:24,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=427680.0, ans=0.125 2024-08-10 06:57:28,895 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 06:57:30,172 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 06:57:53,098 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13800, loss[loss=0.1244, beats_loss=0.01166, ecapa_loss=0.0002291, whisper_loss=0.1105, over 22398.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01235, ecapa_loss=0.0002922, whisper_loss=0.09811, over 3895820.31 frames. ], batch size: 85, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 06:57:57,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=427880.0, ans=0.125 2024-08-10 06:58:10,198 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.463e+01 3.325e+01 3.732e+01 4.469e+01 6.721e+01, threshold=7.464e+01, percent-clipped=0.0 2024-08-10 06:58:10,461 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 38 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 06:58:10,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=427980.0, ans=0.2 2024-08-10 06:58:15,117 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.21 vs. limit=15.0 2024-08-10 06:58:24,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=428080.0, ans=0.0 2024-08-10 06:58:36,940 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-10 06:58:39,793 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 06:58:48,373 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-08-10 06:59:02,320 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13850, loss[loss=0.08953, beats_loss=0.0177, ecapa_loss=0.0002088, whisper_loss=0.06975, over 17811.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.0124, ecapa_loss=0.0002893, whisper_loss=0.09831, over 3888188.18 frames. ], batch size: 71, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 06:59:17,653 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 06:59:38,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=428580.0, ans=0.125 2024-08-10 06:59:44,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=428680.0, ans=0.125 2024-08-10 06:59:47,390 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 06:59:49,291 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=12.0 2024-08-10 06:59:56,521 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 07:00:09,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=428880.0, ans=0.0 2024-08-10 07:00:09,980 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13900, loss[loss=0.1137, beats_loss=0.009791, ecapa_loss=0.0002586, whisper_loss=0.1013, over 18847.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.0122, ecapa_loss=0.0002897, whisper_loss=0.09973, over 3916998.60 frames. ], batch size: 70, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:00:15,065 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.66 vs. limit=15.0 2024-08-10 07:00:17,581 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.67 vs. limit=10.0 2024-08-10 07:00:26,508 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 3.310e+01 3.794e+01 4.612e+01 1.013e+02, threshold=7.587e+01, percent-clipped=2.0 2024-08-10 07:00:34,875 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 17 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 07:01:00,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=429180.0, ans=0.125 2024-08-10 07:01:03,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=429280.0, ans=0.0 2024-08-10 07:01:09,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=429280.0, ans=0.07 2024-08-10 07:01:12,724 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-10 07:01:18,165 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 13950, loss[loss=0.0941, beats_loss=0.01187, ecapa_loss=0.0003441, whisper_loss=0.07879, over 18544.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01226, ecapa_loss=0.0002899, whisper_loss=0.09831, over 3889290.56 frames. ], batch size: 76, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:01:18,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=429380.0, ans=0.125 2024-08-10 07:01:21,143 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 07:01:22,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=429380.0, ans=0.125 2024-08-10 07:01:25,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=429380.0, ans=0.1 2024-08-10 07:01:46,340 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 33 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 07:01:51,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=429580.0, ans=0.0 2024-08-10 07:02:08,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=429680.0, ans=0.95 2024-08-10 07:02:11,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=429680.0, ans=0.09899494936611666 2024-08-10 07:02:20,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=429780.0, ans=0.1 2024-08-10 07:02:25,473 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 07:02:26,611 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 14000, loss[loss=0.1242, beats_loss=0.01171, ecapa_loss=0.0002373, whisper_loss=0.1101, over 21447.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.01227, ecapa_loss=0.0002902, whisper_loss=0.09774, over 3897431.67 frames. ], batch size: 81, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:02:43,021 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 3.368e+01 3.902e+01 4.630e+01 2.044e+02, threshold=7.804e+01, percent-clipped=2.0 2024-08-10 07:03:05,107 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 34 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 07:03:16,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=430180.0, ans=0.1 2024-08-10 07:03:26,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=430280.0, ans=0.0 2024-08-10 07:03:35,077 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 14050, loss[loss=0.1098, beats_loss=0.0105, ecapa_loss=0.0003012, whisper_loss=0.09626, over 14038.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01228, ecapa_loss=0.0002895, whisper_loss=0.09847, over 3886146.67 frames. ], batch size: 54, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:03:38,261 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 18 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-10 07:03:51,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=430480.0, ans=0.0 2024-08-10 07:04:34,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=430780.0, ans=0.1 2024-08-10 07:04:44,919 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 14100, loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0003677, whisper_loss=0.08846, over 17030.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01228, ecapa_loss=0.0002892, whisper_loss=0.09812, over 3895484.07 frames. ], batch size: 71, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:04:47,703 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-10 07:04:56,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=430980.0, ans=0.125 2024-08-10 07:04:59,243 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 07:05:00,220 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.555e+01 3.108e+01 3.411e+01 4.014e+01 7.175e+01, threshold=6.821e+01, percent-clipped=1.0 2024-08-10 07:05:10,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=431080.0, ans=0.05 2024-08-10 07:05:22,231 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 07:05:23,722 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 07:05:26,734 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 07:05:38,144 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2024-08-10 07:05:52,301 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 14150, loss[loss=0.1063, beats_loss=0.01291, ecapa_loss=0.0002275, whisper_loss=0.09108, over 15056.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01225, ecapa_loss=0.0002874, whisper_loss=0.09794, over 3868113.35 frames. ], batch size: 58, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:05:58,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=431380.0, ans=0.125 2024-08-10 07:06:02,343 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 07:06:05,696 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-10 07:06:25,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=431580.0, ans=0.125 2024-08-10 07:06:32,339 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 07:06:38,184 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 07:07:01,268 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 14200, loss[loss=0.1178, beats_loss=0.0103, ecapa_loss=0.0003365, whisper_loss=0.1041, over 19796.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01231, ecapa_loss=0.0002859, whisper_loss=0.09761, over 3883537.66 frames. ], batch size: 78, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:07:07,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=431880.0, ans=0.0 2024-08-10 07:07:13,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=431880.0, ans=0.125 2024-08-10 07:07:15,703 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 07:07:18,317 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.671e+01 3.227e+01 3.786e+01 4.277e+01 7.139e+01, threshold=7.572e+01, percent-clipped=1.0 2024-08-10 07:07:23,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=431980.0, ans=0.125 2024-08-10 07:07:31,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=432080.0, ans=0.125 2024-08-10 07:07:40,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=432080.0, ans=0.125 2024-08-10 07:07:50,435 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 07:08:11,077 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.35 vs. limit=15.0 2024-08-10 07:08:11,368 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 14250, loss[loss=0.1091, beats_loss=0.009793, ecapa_loss=0.0003232, whisper_loss=0.09609, over 15450.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01231, ecapa_loss=0.0002853, whisper_loss=0.09756, over 3900541.06 frames. ], batch size: 58, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:08:18,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=432380.0, ans=0.125 2024-08-10 07:08:24,519 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 07:08:28,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=432480.0, ans=0.0 2024-08-10 07:08:30,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=432480.0, ans=0.0 2024-08-10 07:08:40,227 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.83 vs. limit=22.5 2024-08-10 07:08:42,590 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=432580.0, ans=0.125 2024-08-10 07:08:57,523 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 19 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-10 07:08:58,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=432680.0, ans=0.0 2024-08-10 07:09:20,223 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 14300, loss[loss=0.1204, beats_loss=0.01342, ecapa_loss=0.0002206, whisper_loss=0.1047, over 20655.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01234, ecapa_loss=0.000284, whisper_loss=0.09796, over 3927394.44 frames. ], batch size: 79, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:09:20,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=432880.0, ans=0.0 2024-08-10 07:09:30,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=432880.0, ans=0.125 2024-08-10 07:09:36,815 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 3.216e+01 3.597e+01 4.195e+01 6.015e+01, threshold=7.194e+01, percent-clipped=0.0 2024-08-10 07:09:59,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=433080.0, ans=0.0 2024-08-10 07:10:03,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=433180.0, ans=0.1 2024-08-10 07:10:05,384 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2024-08-10 07:10:10,140 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 07:10:10,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=433180.0, ans=0.125 2024-08-10 07:10:22,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=433280.0, ans=0.0 2024-08-10 07:10:25,470 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=15.0 2024-08-10 07:10:27,562 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-10 07:10:28,797 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 14350, loss[loss=0.09914, beats_loss=0.01261, ecapa_loss=0.0003429, whisper_loss=0.0831, over 20394.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.01242, ecapa_loss=0.0002841, whisper_loss=0.09763, over 3926682.79 frames. ], batch size: 86, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:10:41,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=433480.0, ans=0.2 2024-08-10 07:10:44,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=433480.0, ans=0.125 2024-08-10 07:11:07,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=433580.0, ans=0.0 2024-08-10 07:11:10,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=433680.0, ans=0.125 2024-08-10 07:11:12,984 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-10 07:11:13,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=433680.0, ans=0.125 2024-08-10 07:11:36,809 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 14400, loss[loss=0.1003, beats_loss=0.01264, ecapa_loss=0.0002856, whisper_loss=0.08484, over 22202.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01243, ecapa_loss=0.0002876, whisper_loss=0.09724, over 3919435.75 frames. ], batch size: 88, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:11:49,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=433980.0, ans=0.0 2024-08-10 07:11:53,319 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.454e+01 3.382e+01 3.755e+01 4.286e+01 6.808e+01, threshold=7.511e+01, percent-clipped=0.0 2024-08-10 07:11:53,983 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.33 vs. limit=22.5 2024-08-10 07:11:54,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=433980.0, ans=0.125 2024-08-10 07:12:08,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=434080.0, ans=0.125 2024-08-10 07:12:08,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=434080.0, ans=0.125 2024-08-10 07:12:18,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=434180.0, ans=0.125 2024-08-10 07:12:22,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=434180.0, ans=0.0 2024-08-10 07:12:27,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=434180.0, ans=0.125 2024-08-10 07:12:33,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=434280.0, ans=0.1 2024-08-10 07:12:37,906 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2024-08-10 07:12:44,484 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.41 vs. limit=15.0 2024-08-10 07:12:45,106 INFO [train_multi_KD3.py:1116] (2/4) Epoch 3, batch 14450, loss[loss=0.1395, beats_loss=0.009089, ecapa_loss=0.0003258, whisper_loss=0.1271, over 22151.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01238, ecapa_loss=0.0002886, whisper_loss=0.09771, over 3908358.68 frames. ], batch size: 90, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:12:47,954 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 07:13:02,491 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 07:13:02,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=434480.0, ans=0.025 2024-08-10 07:13:06,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=434480.0, ans=0.125 2024-08-10 07:13:19,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=434580.0, ans=0.125 2024-08-10 07:13:20,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=434580.0, ans=0.1 2024-08-10 07:13:21,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=434580.0, ans=0.2 2024-08-10 07:13:24,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=434680.0, ans=0.1 2024-08-10 07:13:29,582 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 07:14:14,464 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 0, loss[loss=0.08852, beats_loss=0.01352, ecapa_loss=0.000351, whisper_loss=0.07149, over 17094.00 frames. ], tot_loss[loss=0.08852, beats_loss=0.01352, ecapa_loss=0.000351, whisper_loss=0.07149, over 17094.00 frames. ], batch size: 71, lr: 1.62e-02, grad_scale: 67108864.0 2024-08-10 07:14:14,464 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-10 07:14:55,811 INFO [train_multi_KD3.py:1149] (2/4) Epoch 4, validation on ASR_libri: loss=0.268, beats_loss=0, ecapa_loss=0.0008857, whisper_loss=0.2592, over 922467.00 frames. 2024-08-10 07:15:10,885 INFO [train_multi_KD3.py:1149] (2/4) Epoch 4, validation on SV_voxceleb1: loss=0.007801, beats_loss=0, ecapa_loss=0.0007801, whisper_loss=0, over 939242.00 frames. 2024-08-10 07:17:09,592 INFO [train_multi_KD3.py:1149] (2/4) Epoch 4, validation on AT_audioset: loss=0.02834, beats_loss=0.02834, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 07:17:09,596 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 07:17:36,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=434870.0, ans=0.125 2024-08-10 07:18:11,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=434970.0, ans=0.125 2024-08-10 07:18:12,751 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.517e+01 3.318e+01 3.888e+01 4.583e+01 8.270e+01, threshold=7.777e+01, percent-clipped=1.0 2024-08-10 07:18:40,940 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 07:18:47,882 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-10 07:19:19,968 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 50, loss[loss=0.1072, beats_loss=0.01332, ecapa_loss=0.0003014, whisper_loss=0.09086, over 21788.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01249, ecapa_loss=0.0002935, whisper_loss=0.09511, over 916853.26 frames. ], batch size: 88, lr: 1.62e-02, grad_scale: 67108864.0 2024-08-10 07:20:18,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=435470.0, ans=0.125 2024-08-10 07:20:27,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=435470.0, ans=0.125 2024-08-10 07:20:45,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=435570.0, ans=0.2 2024-08-10 07:20:47,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=435570.0, ans=0.1 2024-08-10 07:20:57,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=435670.0, ans=0.0 2024-08-10 07:21:15,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435670.0, ans=0.1 2024-08-10 07:21:21,077 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 100, loss[loss=0.1521, beats_loss=0.007959, ecapa_loss=0.0003009, whisper_loss=0.1412, over 17350.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01222, ecapa_loss=0.0002919, whisper_loss=0.09598, over 1550356.71 frames. ], batch size: 68, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:21:23,313 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-10 07:21:44,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=435870.0, ans=0.0 2024-08-10 07:21:50,565 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 07:22:14,578 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.841e+01 3.372e+01 3.715e+01 4.340e+01 6.479e+01, threshold=7.429e+01, percent-clipped=0.0 2024-08-10 07:22:33,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=436070.0, ans=0.125 2024-08-10 07:23:06,827 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 07:23:11,022 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 29 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 07:23:11,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=436170.0, ans=0.2 2024-08-10 07:23:14,250 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 150, loss[loss=0.1185, beats_loss=0.01147, ecapa_loss=0.0003263, whisper_loss=0.1038, over 19452.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01229, ecapa_loss=0.0002855, whisper_loss=0.09709, over 2051797.07 frames. ], batch size: 79, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:23:21,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=436270.0, ans=0.125 2024-08-10 07:23:33,167 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.76 vs. limit=12.0 2024-08-10 07:23:34,034 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 07:23:38,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=436370.0, ans=0.09899494936611666 2024-08-10 07:23:39,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=436370.0, ans=0.025 2024-08-10 07:23:41,020 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-10 07:23:46,277 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 07:23:51,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=436470.0, ans=0.5 2024-08-10 07:23:54,478 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 07:23:58,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=436470.0, ans=0.125 2024-08-10 07:24:00,882 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 07:24:03,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=436470.0, ans=0.2 2024-08-10 07:24:04,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=436570.0, ans=0.1 2024-08-10 07:24:15,957 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 07:24:17,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=436570.0, ans=0.125 2024-08-10 07:24:26,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=436670.0, ans=0.1 2024-08-10 07:24:38,877 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 200, loss[loss=0.1543, beats_loss=0.01036, ecapa_loss=0.0003072, whisper_loss=0.1408, over 21861.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01225, ecapa_loss=0.0002844, whisper_loss=0.09744, over 2420574.30 frames. ], batch size: 82, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:24:38,989 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 07:24:44,349 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 25 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-10 07:24:53,613 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 07:25:07,104 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 07:25:10,573 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-10 07:25:12,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=436970.0, ans=0.125 2024-08-10 07:25:12,367 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=15.0 2024-08-10 07:25:13,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=436970.0, ans=0.09899494936611666 2024-08-10 07:25:14,587 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.684e+01 3.326e+01 3.682e+01 4.488e+01 7.047e+01, threshold=7.364e+01, percent-clipped=0.0 2024-08-10 07:25:16,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=436970.0, ans=0.0 2024-08-10 07:25:37,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=437070.0, ans=0.125 2024-08-10 07:25:49,933 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 07:25:54,439 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 07:25:57,674 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 250, loss[loss=0.09992, beats_loss=0.01402, ecapa_loss=0.0002069, whisper_loss=0.08383, over 20180.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01231, ecapa_loss=0.0002829, whisper_loss=0.09714, over 2764248.27 frames. ], batch size: 77, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:26:07,302 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 07:26:15,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=437370.0, ans=0.0 2024-08-10 07:26:29,351 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.58 vs. limit=15.0 2024-08-10 07:26:54,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=437570.0, ans=0.125 2024-08-10 07:27:05,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=437670.0, ans=0.125 2024-08-10 07:27:07,138 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 07:27:08,890 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-10 07:27:10,371 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 07:27:12,781 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 300, loss[loss=0.1006, beats_loss=0.01128, ecapa_loss=0.0002514, whisper_loss=0.08684, over 22682.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01224, ecapa_loss=0.0002811, whisper_loss=0.09638, over 2984097.98 frames. ], batch size: 85, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:27:24,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=437770.0, ans=0.0 2024-08-10 07:27:28,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=437870.0, ans=0.125 2024-08-10 07:27:36,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=437870.0, ans=0.0 2024-08-10 07:27:39,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=437870.0, ans=0.125 2024-08-10 07:27:46,502 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 3.173e+01 3.597e+01 4.305e+01 6.522e+01, threshold=7.194e+01, percent-clipped=0.0 2024-08-10 07:27:50,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=437970.0, ans=0.125 2024-08-10 07:28:06,825 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 07:28:08,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=438070.0, ans=0.0 2024-08-10 07:28:10,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=438070.0, ans=0.0 2024-08-10 07:28:15,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2024-08-10 07:28:24,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-10 07:28:27,307 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 350, loss[loss=0.1116, beats_loss=0.01176, ecapa_loss=0.0002286, whisper_loss=0.09757, over 14906.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.0122, ecapa_loss=0.0002793, whisper_loss=0.09593, over 3162800.96 frames. ], batch size: 55, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:29:03,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=438470.0, ans=0.0 2024-08-10 07:29:06,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=438470.0, ans=0.125 2024-08-10 07:29:38,938 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 26 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-10 07:29:42,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=438770.0, ans=0.0 2024-08-10 07:29:42,756 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 400, loss[loss=0.1206, beats_loss=0.01291, ecapa_loss=0.0003198, whisper_loss=0.1045, over 22516.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01205, ecapa_loss=0.0002793, whisper_loss=0.0969, over 3318960.61 frames. ], batch size: 92, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:29:43,997 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=12.0 2024-08-10 07:29:48,051 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 07:30:12,782 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.367e+00 2024-08-10 07:30:16,557 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.492e+01 3.285e+01 3.710e+01 4.185e+01 8.184e+01, threshold=7.420e+01, percent-clipped=1.0 2024-08-10 07:30:17,009 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 24 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-10 07:30:24,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=438970.0, ans=0.0 2024-08-10 07:30:30,456 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 07:30:34,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=439070.0, ans=22.5 2024-08-10 07:30:53,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=439270.0, ans=0.1 2024-08-10 07:30:54,493 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 450, loss[loss=0.0882, beats_loss=0.01381, ecapa_loss=0.0002038, whisper_loss=0.07235, over 16487.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.012, ecapa_loss=0.0002799, whisper_loss=0.09645, over 3413129.14 frames. ], batch size: 63, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:31:04,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=439270.0, ans=0.125 2024-08-10 07:31:23,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=439470.0, ans=15.0 2024-08-10 07:31:43,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=439570.0, ans=0.0 2024-08-10 07:31:44,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=439570.0, ans=10.0 2024-08-10 07:31:45,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=439570.0, ans=0.125 2024-08-10 07:31:49,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=439670.0, ans=0.1 2024-08-10 07:31:51,139 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=15.0 2024-08-10 07:32:00,859 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 500, loss[loss=0.08779, beats_loss=0.015, ecapa_loss=0.0002439, whisper_loss=0.07035, over 14630.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01202, ecapa_loss=0.0002786, whisper_loss=0.09658, over 3508003.12 frames. ], batch size: 58, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:32:06,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=439770.0, ans=0.125 2024-08-10 07:32:16,693 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 07:32:27,033 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 18 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 07:32:33,888 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.388e+01 2.971e+01 3.310e+01 3.858e+01 7.927e+01, threshold=6.621e+01, percent-clipped=1.0 2024-08-10 07:32:34,083 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 07:32:59,303 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 07:33:05,839 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 07:33:09,373 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 550, loss[loss=0.1027, beats_loss=0.01368, ecapa_loss=0.0002788, whisper_loss=0.08626, over 22031.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01205, ecapa_loss=0.0002761, whisper_loss=0.09596, over 3569242.42 frames. ], batch size: 91, lr: 1.61e-02, grad_scale: 134217728.0 2024-08-10 07:33:17,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=440270.0, ans=0.125 2024-08-10 07:33:20,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=440270.0, ans=0.125 2024-08-10 07:33:46,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=440470.0, ans=0.2 2024-08-10 07:33:51,818 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 07:33:56,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=440570.0, ans=0.0 2024-08-10 07:34:10,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=440670.0, ans=0.125 2024-08-10 07:34:11,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=440670.0, ans=0.125 2024-08-10 07:34:14,933 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 600, loss[loss=0.1289, beats_loss=0.01026, ecapa_loss=0.0002467, whisper_loss=0.1162, over 15776.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.012, ecapa_loss=0.0002747, whisper_loss=0.09659, over 3599153.82 frames. ], batch size: 58, lr: 1.61e-02, grad_scale: 134217728.0 2024-08-10 07:34:16,714 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 07:34:16,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=440770.0, ans=0.125 2024-08-10 07:34:18,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=440770.0, ans=0.04949747468305833 2024-08-10 07:34:22,503 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=15.0 2024-08-10 07:34:27,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=440870.0, ans=0.2 2024-08-10 07:34:45,096 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.403e+01 3.004e+01 3.329e+01 3.797e+01 6.092e+01, threshold=6.657e+01, percent-clipped=0.0 2024-08-10 07:34:51,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=440970.0, ans=0.2 2024-08-10 07:35:07,881 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.761e+03 2024-08-10 07:35:12,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=441170.0, ans=0.0 2024-08-10 07:35:20,240 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 650, loss[loss=0.1067, beats_loss=0.009705, ecapa_loss=0.0002187, whisper_loss=0.09485, over 18460.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01201, ecapa_loss=0.0002743, whisper_loss=0.09623, over 3673672.38 frames. ], batch size: 66, lr: 1.61e-02, grad_scale: 134217728.0 2024-08-10 07:35:23,020 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 07:35:29,221 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 07:35:39,928 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 07:35:40,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=441370.0, ans=0.125 2024-08-10 07:35:41,401 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 07:35:50,369 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 31 from LS+wenet, 8 from Vox, 20 fro AS 2024-08-10 07:35:56,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=441470.0, ans=0.125 2024-08-10 07:36:03,397 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.15 vs. limit=15.0 2024-08-10 07:36:26,535 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 700, loss[loss=0.1254, beats_loss=0.01086, ecapa_loss=0.0002728, whisper_loss=0.1118, over 18695.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01193, ecapa_loss=0.0002768, whisper_loss=0.09697, over 3730018.63 frames. ], batch size: 72, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:36:45,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=441870.0, ans=0.0 2024-08-10 07:36:53,401 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.93 vs. limit=22.5 2024-08-10 07:36:55,322 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 13 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 07:36:56,483 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.476e+01 3.137e+01 3.551e+01 4.143e+01 1.211e+02, threshold=7.103e+01, percent-clipped=4.0 2024-08-10 07:36:59,677 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 07:37:04,909 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-10 07:37:10,102 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 13 from LS+wenet, 23 from Vox, 17 fro AS 2024-08-10 07:37:10,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=442070.0, ans=0.0 2024-08-10 07:37:12,125 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=25.55 vs. limit=15.0 2024-08-10 07:37:19,128 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 07:37:26,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=442170.0, ans=0.2 2024-08-10 07:37:26,619 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.52 vs. limit=15.0 2024-08-10 07:37:27,412 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 07:37:32,319 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 750, loss[loss=0.1357, beats_loss=0.008471, ecapa_loss=0.0003119, whisper_loss=0.1241, over 22982.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.012, ecapa_loss=0.0002743, whisper_loss=0.09666, over 3723192.71 frames. ], batch size: 90, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:37:41,445 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-10 07:37:49,333 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 22 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-10 07:37:55,198 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.16 vs. limit=15.0 2024-08-10 07:37:59,719 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 07:38:06,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=442470.0, ans=0.1 2024-08-10 07:38:12,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=442570.0, ans=0.125 2024-08-10 07:38:15,545 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 07:38:23,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=442670.0, ans=0.125 2024-08-10 07:38:25,360 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.17 vs. limit=12.0 2024-08-10 07:38:28,715 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 07:38:37,365 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 800, loss[loss=0.0944, beats_loss=0.01356, ecapa_loss=0.000299, whisper_loss=0.07784, over 21520.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01206, ecapa_loss=0.0002734, whisper_loss=0.0962, over 3736375.88 frames. ], batch size: 89, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:38:38,317 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.29 vs. limit=15.0 2024-08-10 07:38:40,642 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 07:38:46,945 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 07:39:07,620 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.441e+01 2.938e+01 3.331e+01 3.852e+01 7.963e+01, threshold=6.661e+01, percent-clipped=1.0 2024-08-10 07:39:14,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=442970.0, ans=0.025 2024-08-10 07:39:15,038 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2024-08-10 07:39:15,542 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 07:39:23,217 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 07:39:31,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=443170.0, ans=0.2 2024-08-10 07:39:42,988 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 850, loss[loss=0.1155, beats_loss=0.01225, ecapa_loss=0.0002738, whisper_loss=0.1005, over 16807.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01208, ecapa_loss=0.0002724, whisper_loss=0.09579, over 3773170.59 frames. ], batch size: 66, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:39:43,165 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 07:39:49,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=443270.0, ans=0.1 2024-08-10 07:39:50,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=443270.0, ans=0.2 2024-08-10 07:39:53,138 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.00 vs. limit=15.0 2024-08-10 07:39:58,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=443370.0, ans=0.05 2024-08-10 07:40:00,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=443370.0, ans=0.125 2024-08-10 07:40:05,841 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=12.0 2024-08-10 07:40:09,321 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 07:40:12,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=443470.0, ans=0.2 2024-08-10 07:40:29,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=443570.0, ans=0.1 2024-08-10 07:40:36,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=443670.0, ans=0.0 2024-08-10 07:40:42,322 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 07:40:48,644 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 900, loss[loss=0.1108, beats_loss=0.01336, ecapa_loss=0.0002174, whisper_loss=0.09526, over 19983.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01206, ecapa_loss=0.0002733, whisper_loss=0.09562, over 3752753.82 frames. ], batch size: 78, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:40:48,753 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-10 07:41:18,855 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.155e+01 3.112e+01 3.456e+01 3.897e+01 5.995e+01, threshold=6.912e+01, percent-clipped=0.0 2024-08-10 07:41:23,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=443970.0, ans=0.125 2024-08-10 07:41:24,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=443970.0, ans=0.0 2024-08-10 07:41:49,202 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.18 vs. limit=10.0 2024-08-10 07:41:53,717 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 950, loss[loss=0.07956, beats_loss=0.01789, ecapa_loss=0.0002049, whisper_loss=0.05962, over 18033.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01213, ecapa_loss=0.0002698, whisper_loss=0.09536, over 3756622.88 frames. ], batch size: 74, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:42:03,763 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.99 vs. limit=15.0 2024-08-10 07:42:29,669 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.90 vs. limit=22.5 2024-08-10 07:42:40,550 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 29 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-10 07:42:59,183 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1000, loss[loss=0.1021, beats_loss=0.01182, ecapa_loss=0.0003298, whisper_loss=0.08696, over 14220.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.0123, ecapa_loss=0.0002665, whisper_loss=0.09479, over 3755242.76 frames. ], batch size: 57, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:42:59,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=444770.0, ans=0.0 2024-08-10 07:43:11,161 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 39 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-10 07:43:19,299 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2024-08-10 07:43:23,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=444870.0, ans=0.125 2024-08-10 07:43:29,137 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 3.226e+01 3.648e+01 4.312e+01 7.271e+01, threshold=7.295e+01, percent-clipped=2.0 2024-08-10 07:43:41,550 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-10 07:43:41,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=445070.0, ans=0.0 2024-08-10 07:43:44,085 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 07:43:44,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=445070.0, ans=0.0 2024-08-10 07:44:04,821 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1050, loss[loss=0.1272, beats_loss=0.01246, ecapa_loss=0.0002187, whisper_loss=0.1125, over 24267.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01228, ecapa_loss=0.0002657, whisper_loss=0.09503, over 3782298.46 frames. ], batch size: 92, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:44:06,268 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 07:44:07,588 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 07:44:09,009 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 07:44:12,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=445270.0, ans=0.125 2024-08-10 07:44:19,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=445370.0, ans=10.0 2024-08-10 07:44:24,431 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-10 07:44:36,118 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 07:44:37,850 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.964e+00 2024-08-10 07:44:41,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=445470.0, ans=0.125 2024-08-10 07:44:52,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=445570.0, ans=0.125 2024-08-10 07:45:02,406 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 07:45:03,715 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 07:45:09,876 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1100, loss[loss=0.123, beats_loss=0.01358, ecapa_loss=0.0002102, whisper_loss=0.1073, over 18617.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01223, ecapa_loss=0.0002638, whisper_loss=0.09588, over 3764834.23 frames. ], batch size: 70, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:45:13,983 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-10 07:45:28,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=445870.0, ans=0.0 2024-08-10 07:45:34,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=445970.0, ans=0.0 2024-08-10 07:45:35,027 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.49 vs. limit=22.5 2024-08-10 07:45:37,820 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=15.0 2024-08-10 07:45:39,656 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.281e+01 3.161e+01 3.477e+01 3.934e+01 8.780e+01, threshold=6.953e+01, percent-clipped=2.0 2024-08-10 07:45:55,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=446070.0, ans=0.125 2024-08-10 07:46:04,467 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 07:46:14,820 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1150, loss[loss=0.1159, beats_loss=0.01096, ecapa_loss=0.0002512, whisper_loss=0.1024, over 24198.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01226, ecapa_loss=0.0002621, whisper_loss=0.09599, over 3786743.00 frames. ], batch size: 93, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:46:28,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=446370.0, ans=0.125 2024-08-10 07:47:10,029 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-10 07:47:19,289 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 07:47:20,448 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1200, loss[loss=0.09368, beats_loss=0.01407, ecapa_loss=0.0002382, whisper_loss=0.07723, over 15605.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01225, ecapa_loss=0.000262, whisper_loss=0.09595, over 3790096.84 frames. ], batch size: 64, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:47:22,034 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-10 07:47:50,945 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.381e+01 3.044e+01 3.412e+01 3.944e+01 6.015e+01, threshold=6.823e+01, percent-clipped=0.0 2024-08-10 07:47:55,230 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 07:48:09,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=447070.0, ans=0.125 2024-08-10 07:48:28,403 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1250, loss[loss=0.1025, beats_loss=0.01314, ecapa_loss=0.0002779, whisper_loss=0.08663, over 15365.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.0122, ecapa_loss=0.0002622, whisper_loss=0.0957, over 3781110.40 frames. ], batch size: 62, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:48:36,161 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=22.5 2024-08-10 07:48:39,943 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-10 07:48:40,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=447270.0, ans=0.5 2024-08-10 07:48:41,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=447370.0, ans=0.125 2024-08-10 07:48:44,302 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 07:48:51,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=447370.0, ans=0.0 2024-08-10 07:48:52,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=447370.0, ans=0.125 2024-08-10 07:48:54,806 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=20.61 vs. limit=15.0 2024-08-10 07:49:20,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=447570.0, ans=0.0 2024-08-10 07:49:20,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=447570.0, ans=0.125 2024-08-10 07:49:33,155 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 07:49:33,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=447670.0, ans=0.0 2024-08-10 07:49:39,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=447770.0, ans=0.125 2024-08-10 07:49:39,973 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1300, loss[loss=0.1177, beats_loss=0.01046, ecapa_loss=0.0003308, whisper_loss=0.104, over 21234.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01225, ecapa_loss=0.0002615, whisper_loss=0.09509, over 3806114.76 frames. ], batch size: 87, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:49:53,132 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.726e+00 2024-08-10 07:49:55,461 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 07:50:08,452 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 07:50:10,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=447970.0, ans=0.125 2024-08-10 07:50:11,143 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-10 07:50:11,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=447970.0, ans=0.125 2024-08-10 07:50:12,273 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+01 3.001e+01 3.337e+01 3.796e+01 6.277e+01, threshold=6.674e+01, percent-clipped=0.0 2024-08-10 07:50:14,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=447970.0, ans=0.1 2024-08-10 07:50:22,584 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 07:50:28,183 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 07:50:30,362 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.28 vs. limit=22.5 2024-08-10 07:50:31,068 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-10 07:50:51,270 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1350, loss[loss=0.1237, beats_loss=0.01134, ecapa_loss=0.000311, whisper_loss=0.1092, over 21357.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.0122, ecapa_loss=0.0002609, whisper_loss=0.09505, over 3810831.27 frames. ], batch size: 91, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:51:12,060 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 07:51:15,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=448370.0, ans=0.0 2024-08-10 07:51:24,284 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 10 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 07:51:24,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=448470.0, ans=0.07 2024-08-10 07:51:27,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=448470.0, ans=0.0 2024-08-10 07:52:00,653 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 22 from LS+wenet, 23 from Vox, 50 fro AS 2024-08-10 07:52:03,524 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1400, loss[loss=0.1202, beats_loss=0.01229, ecapa_loss=0.0002283, whisper_loss=0.1056, over 18362.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01211, ecapa_loss=0.0002606, whisper_loss=0.0953, over 3814270.60 frames. ], batch size: 68, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:52:33,544 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.23 vs. limit=15.0 2024-08-10 07:52:37,486 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.342e+01 2.977e+01 3.358e+01 3.939e+01 6.744e+01, threshold=6.717e+01, percent-clipped=2.0 2024-08-10 07:52:51,300 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 07:52:51,678 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.60 vs. limit=15.0 2024-08-10 07:52:52,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=449070.0, ans=0.125 2024-08-10 07:52:56,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=449070.0, ans=0.125 2024-08-10 07:52:56,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=449070.0, ans=0.2 2024-08-10 07:52:59,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=449070.0, ans=0.0 2024-08-10 07:53:17,575 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1450, loss[loss=0.09977, beats_loss=0.01481, ecapa_loss=0.0002815, whisper_loss=0.08215, over 14279.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01212, ecapa_loss=0.0002605, whisper_loss=0.09514, over 3785066.97 frames. ], batch size: 62, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:53:50,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=449270.0, ans=0.0 2024-08-10 07:53:55,724 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 07:54:03,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=449370.0, ans=0.0 2024-08-10 07:54:12,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=449370.0, ans=0.2 2024-08-10 07:54:28,817 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.95 vs. limit=6.0 2024-08-10 07:54:49,110 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.09 vs. limit=22.5 2024-08-10 07:54:59,711 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.45 vs. limit=22.5 2024-08-10 07:55:00,474 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1500, loss[loss=0.09998, beats_loss=0.01464, ecapa_loss=0.0002378, whisper_loss=0.08296, over 22631.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01207, ecapa_loss=0.00026, whisper_loss=0.095, over 3796553.98 frames. ], batch size: 90, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:55:24,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=449870.0, ans=0.125 2024-08-10 07:55:35,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=449970.0, ans=15.0 2024-08-10 07:55:35,947 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.167e+01 2.938e+01 3.327e+01 3.975e+01 6.102e+01, threshold=6.654e+01, percent-clipped=0.0 2024-08-10 07:55:48,753 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2024-08-10 07:55:51,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=450070.0, ans=0.125 2024-08-10 07:56:04,924 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 07:56:15,691 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 07:56:16,681 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1550, loss[loss=0.1139, beats_loss=0.01256, ecapa_loss=0.000246, whisper_loss=0.09889, over 20918.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01213, ecapa_loss=0.0002593, whisper_loss=0.09472, over 3789350.83 frames. ], batch size: 83, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:56:20,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=450270.0, ans=0.2 2024-08-10 07:56:26,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=450270.0, ans=0.125 2024-08-10 07:56:53,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=450470.0, ans=0.125 2024-08-10 07:57:07,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=450570.0, ans=0.125 2024-08-10 07:57:10,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=450570.0, ans=0.125 2024-08-10 07:57:17,656 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 07:57:32,244 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1600, loss[loss=0.1137, beats_loss=0.01326, ecapa_loss=0.0002721, whisper_loss=0.09776, over 18410.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01207, ecapa_loss=0.0002598, whisper_loss=0.09533, over 3811676.39 frames. ], batch size: 73, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:57:44,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=450770.0, ans=0.125 2024-08-10 07:58:07,100 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 3.094e+01 3.435e+01 3.999e+01 7.884e+01, threshold=6.871e+01, percent-clipped=1.0 2024-08-10 07:58:08,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=450970.0, ans=0.125 2024-08-10 07:58:09,028 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 07:58:10,790 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 23 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-10 07:58:19,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=451070.0, ans=0.125 2024-08-10 07:58:19,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=451070.0, ans=0.0 2024-08-10 07:58:30,527 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 07:58:41,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=451170.0, ans=0.0 2024-08-10 07:58:46,855 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1650, loss[loss=0.09681, beats_loss=0.01367, ecapa_loss=0.0002107, whisper_loss=0.08103, over 18667.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01211, ecapa_loss=0.0002594, whisper_loss=0.09533, over 3827705.89 frames. ], batch size: 74, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:58:48,661 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 07:58:53,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=451270.0, ans=0.0 2024-08-10 07:58:59,451 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.72 vs. limit=15.0 2024-08-10 07:59:00,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=451370.0, ans=0.125 2024-08-10 07:59:22,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=451470.0, ans=0.125 2024-08-10 07:59:35,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=451570.0, ans=0.125 2024-08-10 07:59:45,835 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=15.0 2024-08-10 07:59:56,649 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 07:59:58,959 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1700, loss[loss=0.1336, beats_loss=0.007997, ecapa_loss=0.0002507, whisper_loss=0.1231, over 16390.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01204, ecapa_loss=0.0002581, whisper_loss=0.0963, over 3831761.89 frames. ], batch size: 60, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 08:00:03,596 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 10 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 08:00:06,839 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.29 vs. limit=12.0 2024-08-10 08:00:12,416 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 08:00:29,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.89 vs. limit=12.0 2024-08-10 08:00:31,358 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.517e+01 3.130e+01 3.389e+01 3.948e+01 7.641e+01, threshold=6.778e+01, percent-clipped=2.0 2024-08-10 08:00:33,595 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.48 vs. limit=22.5 2024-08-10 08:00:41,337 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 08:00:43,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=452070.0, ans=0.125 2024-08-10 08:00:50,437 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.310e-03 2024-08-10 08:00:58,686 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 08:01:08,910 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1750, loss[loss=0.1087, beats_loss=0.01141, ecapa_loss=0.0002512, whisper_loss=0.09475, over 20891.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01202, ecapa_loss=0.0002596, whisper_loss=0.09565, over 3844921.36 frames. ], batch size: 82, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 08:01:09,064 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 08:01:17,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=452270.0, ans=0.125 2024-08-10 08:01:18,012 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2024-08-10 08:01:41,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=452470.0, ans=0.125 2024-08-10 08:01:56,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=452570.0, ans=0.125 2024-08-10 08:01:57,424 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 19 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 08:02:03,175 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 08:02:12,905 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-10 08:02:18,111 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1800, loss[loss=0.1165, beats_loss=0.01175, ecapa_loss=0.0002462, whisper_loss=0.1023, over 14723.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01208, ecapa_loss=0.0002589, whisper_loss=0.09561, over 3855948.48 frames. ], batch size: 57, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 08:02:45,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=452970.0, ans=0.125 2024-08-10 08:02:49,237 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.491e+01 3.196e+01 3.582e+01 4.110e+01 5.783e+01, threshold=7.164e+01, percent-clipped=0.0 2024-08-10 08:03:02,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=453070.0, ans=0.125 2024-08-10 08:03:07,566 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.23 vs. limit=10.0 2024-08-10 08:03:18,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=453170.0, ans=0.0 2024-08-10 08:03:21,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=453170.0, ans=0.125 2024-08-10 08:03:21,711 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.71 vs. limit=6.0 2024-08-10 08:03:25,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=453270.0, ans=0.125 2024-08-10 08:03:25,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=453270.0, ans=0.0 2024-08-10 08:03:26,444 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1850, loss[loss=0.1193, beats_loss=0.01215, ecapa_loss=0.0002267, whisper_loss=0.1049, over 23857.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01208, ecapa_loss=0.0002592, whisper_loss=0.09513, over 3835733.70 frames. ], batch size: 92, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:03:42,093 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 08:03:43,384 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-10 08:03:45,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=453370.0, ans=0.0 2024-08-10 08:03:50,561 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-10 08:03:56,555 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 08:04:14,924 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 08:04:22,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=453570.0, ans=0.125 2024-08-10 08:04:39,092 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1900, loss[loss=0.1209, beats_loss=0.01229, ecapa_loss=0.0002172, whisper_loss=0.1064, over 19146.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01202, ecapa_loss=0.0002644, whisper_loss=0.09549, over 3824463.60 frames. ], batch size: 71, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:04:44,679 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 08:04:47,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=453770.0, ans=0.125 2024-08-10 08:05:01,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.56 vs. limit=12.0 2024-08-10 08:05:06,782 INFO [train_multi_KD3.py:844] (2/4) A total of 97 cuts. 25 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-10 08:05:08,041 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 08:05:10,679 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 3.027e+01 3.393e+01 3.845e+01 7.336e+01, threshold=6.786e+01, percent-clipped=1.0 2024-08-10 08:05:29,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=454070.0, ans=0.5 2024-08-10 08:05:38,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=454170.0, ans=0.1 2024-08-10 08:05:49,252 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 1950, loss[loss=0.09593, beats_loss=0.01347, ecapa_loss=0.0003146, whisper_loss=0.07931, over 21436.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01208, ecapa_loss=0.0002682, whisper_loss=0.09546, over 3804810.60 frames. ], batch size: 94, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:06:04,748 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2024-08-10 08:06:11,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=454370.0, ans=0.2 2024-08-10 08:06:12,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=454370.0, ans=0.0 2024-08-10 08:06:13,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=454370.0, ans=0.2 2024-08-10 08:06:17,339 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=15.0 2024-08-10 08:06:27,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=454470.0, ans=0.125 2024-08-10 08:06:29,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=454470.0, ans=0.2 2024-08-10 08:06:32,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=454570.0, ans=0.125 2024-08-10 08:06:39,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=454570.0, ans=0.125 2024-08-10 08:06:49,329 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 08:07:00,757 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2000, loss[loss=0.1003, beats_loss=0.01302, ecapa_loss=0.0003115, whisper_loss=0.08415, over 19630.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01211, ecapa_loss=0.0002712, whisper_loss=0.09538, over 3787751.79 frames. ], batch size: 82, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:07:17,326 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 08:07:28,717 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 17 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 08:07:34,696 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.624e+01 3.304e+01 3.702e+01 4.234e+01 5.771e+01, threshold=7.405e+01, percent-clipped=0.0 2024-08-10 08:07:43,569 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 20 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-10 08:07:59,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=455170.0, ans=0.0 2024-08-10 08:08:13,083 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2050, loss[loss=0.1168, beats_loss=0.009344, ecapa_loss=0.0003257, whisper_loss=0.1042, over 18739.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01212, ecapa_loss=0.0002716, whisper_loss=0.09528, over 3791350.90 frames. ], batch size: 73, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:08:14,663 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 26 from LS+wenet, 16 from Vox, 53 fro AS 2024-08-10 08:08:29,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=455370.0, ans=0.125 2024-08-10 08:08:54,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=455570.0, ans=0.125 2024-08-10 08:09:03,811 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 08:09:16,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=455670.0, ans=0.125 2024-08-10 08:09:17,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=455670.0, ans=0.125 2024-08-10 08:09:18,150 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-08-10 08:09:24,115 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2100, loss[loss=0.09418, beats_loss=0.01328, ecapa_loss=0.0002644, whisper_loss=0.07825, over 15160.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01219, ecapa_loss=0.0002703, whisper_loss=0.09513, over 3808809.49 frames. ], batch size: 59, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:09:32,749 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-10 08:09:39,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=455870.0, ans=0.2 2024-08-10 08:09:44,866 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 08:09:57,650 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.943e+01 3.340e+01 3.951e+01 7.714e+01, threshold=6.679e+01, percent-clipped=1.0 2024-08-10 08:10:01,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=455970.0, ans=0.125 2024-08-10 08:10:08,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=456070.0, ans=0.0 2024-08-10 08:10:18,901 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 08:10:19,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=456070.0, ans=0.125 2024-08-10 08:10:21,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=456170.0, ans=0.125 2024-08-10 08:10:26,248 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.35 vs. limit=15.0 2024-08-10 08:10:31,449 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 08:10:35,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=456270.0, ans=0.0 2024-08-10 08:10:35,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=456270.0, ans=0.1 2024-08-10 08:10:36,797 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2150, loss[loss=0.1108, beats_loss=0.01102, ecapa_loss=0.0002859, whisper_loss=0.09693, over 24126.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01211, ecapa_loss=0.0002724, whisper_loss=0.09577, over 3780970.15 frames. ], batch size: 93, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:10:42,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=456270.0, ans=0.125 2024-08-10 08:10:47,726 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 08:10:47,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=456270.0, ans=0.1 2024-08-10 08:11:03,863 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 08:11:06,314 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2024-08-10 08:11:20,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=456570.0, ans=0.1 2024-08-10 08:11:24,999 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.57 vs. limit=15.0 2024-08-10 08:11:26,238 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 08:11:43,310 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 08:11:48,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=456670.0, ans=0.125 2024-08-10 08:11:51,034 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2200, loss[loss=0.1319, beats_loss=0.009965, ecapa_loss=0.0002602, whisper_loss=0.1194, over 24547.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01211, ecapa_loss=0.0002727, whisper_loss=0.09679, over 3820970.14 frames. ], batch size: 89, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:12:26,127 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 3.107e+01 3.618e+01 4.202e+01 6.900e+01, threshold=7.235e+01, percent-clipped=1.0 2024-08-10 08:12:26,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=456970.0, ans=0.125 2024-08-10 08:12:31,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=456970.0, ans=0.1 2024-08-10 08:12:46,558 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 08:12:58,799 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 32 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-10 08:13:04,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=457270.0, ans=0.0 2024-08-10 08:13:05,247 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2250, loss[loss=0.1338, beats_loss=0.0123, ecapa_loss=0.0003144, whisper_loss=0.1184, over 22442.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.0122, ecapa_loss=0.0002723, whisper_loss=0.09749, over 3864927.69 frames. ], batch size: 89, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:13:05,726 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 34 from Vox, 32 fro AS 2024-08-10 08:13:07,263 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 08:13:45,850 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 37 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 08:13:58,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=457570.0, ans=0.125 2024-08-10 08:13:59,173 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 16 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 08:14:21,611 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2300, loss[loss=0.07757, beats_loss=0.01609, ecapa_loss=0.0002693, whisper_loss=0.05878, over 12884.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01218, ecapa_loss=0.000273, whisper_loss=0.09782, over 3861323.66 frames. ], batch size: 54, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:14:53,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=457970.0, ans=0.1 2024-08-10 08:14:56,707 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 3.052e+01 3.526e+01 3.987e+01 6.394e+01, threshold=7.053e+01, percent-clipped=0.0 2024-08-10 08:15:31,527 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-10 08:15:37,211 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2350, loss[loss=0.1257, beats_loss=0.01254, ecapa_loss=0.0002751, whisper_loss=0.1105, over 20589.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01204, ecapa_loss=0.0002756, whisper_loss=0.09892, over 3857435.82 frames. ], batch size: 82, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:15:39,154 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.175e-01 2024-08-10 08:15:46,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=458270.0, ans=0.125 2024-08-10 08:15:51,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=458370.0, ans=0.125 2024-08-10 08:15:56,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=458370.0, ans=0.2 2024-08-10 08:15:56,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=458370.0, ans=0.0 2024-08-10 08:16:03,920 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.81 vs. limit=6.0 2024-08-10 08:16:06,073 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 08:16:19,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=458470.0, ans=0.125 2024-08-10 08:16:26,237 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.13 vs. limit=10.0 2024-08-10 08:16:56,448 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2400, loss[loss=0.1181, beats_loss=0.01256, ecapa_loss=0.000289, whisper_loss=0.1027, over 21802.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01196, ecapa_loss=0.0002752, whisper_loss=0.09871, over 3863464.61 frames. ], batch size: 90, lr: 1.57e-02, grad_scale: 134217728.0 2024-08-10 08:17:11,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=458870.0, ans=0.0 2024-08-10 08:17:27,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=458970.0, ans=0.125 2024-08-10 08:17:29,648 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.868e+01 3.229e+01 3.686e+01 5.514e+01, threshold=6.458e+01, percent-clipped=0.0 2024-08-10 08:17:52,158 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.93 vs. limit=10.0 2024-08-10 08:17:56,193 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 27 from LS+wenet, 12 from Vox, 17 fro AS 2024-08-10 08:18:11,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=459170.0, ans=0.125 2024-08-10 08:18:18,746 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2450, loss[loss=0.1092, beats_loss=0.01453, ecapa_loss=0.0002556, whisper_loss=0.09206, over 23107.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.012, ecapa_loss=0.0002751, whisper_loss=0.0977, over 3854018.57 frames. ], batch size: 92, lr: 1.57e-02, grad_scale: 134217728.0 2024-08-10 08:18:41,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=459370.0, ans=0.125 2024-08-10 08:19:25,291 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=15.0 2024-08-10 08:19:28,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=459670.0, ans=0.1 2024-08-10 08:19:41,787 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2500, loss[loss=0.105, beats_loss=0.01093, ecapa_loss=0.0002735, whisper_loss=0.09129, over 14611.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01201, ecapa_loss=0.0002765, whisper_loss=0.09805, over 3860863.93 frames. ], batch size: 56, lr: 1.57e-02, grad_scale: 134217728.0 2024-08-10 08:19:49,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=459770.0, ans=0.125 2024-08-10 08:19:59,147 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.121e+03 2024-08-10 08:20:08,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=459870.0, ans=0.0 2024-08-10 08:20:20,034 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-10 08:20:31,190 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+01 2.999e+01 3.542e+01 3.925e+01 6.520e+01, threshold=7.085e+01, percent-clipped=1.0 2024-08-10 08:20:31,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=459970.0, ans=0.09899494936611666 2024-08-10 08:20:37,893 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 08:20:42,571 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 21 from Vox, 14 fro AS 2024-08-10 08:21:22,956 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 08:21:23,675 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.84 vs. limit=6.0 2024-08-10 08:21:25,546 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2550, loss[loss=0.1295, beats_loss=0.01084, ecapa_loss=0.000212, whisper_loss=0.1166, over 22520.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01196, ecapa_loss=0.0002773, whisper_loss=0.09872, over 3898597.02 frames. ], batch size: 85, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:21:40,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=460270.0, ans=0.125 2024-08-10 08:21:54,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=460370.0, ans=0.2 2024-08-10 08:21:54,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=460370.0, ans=0.0 2024-08-10 08:21:57,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=460370.0, ans=0.125 2024-08-10 08:22:08,692 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 15 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 08:22:42,289 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-10 08:23:07,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=460770.0, ans=0.125 2024-08-10 08:23:08,616 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2600, loss[loss=0.119, beats_loss=0.01152, ecapa_loss=0.0002673, whisper_loss=0.1048, over 22570.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01205, ecapa_loss=0.0002756, whisper_loss=0.09798, over 3900728.61 frames. ], batch size: 90, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:23:48,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=460870.0, ans=10.0 2024-08-10 08:23:54,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=460970.0, ans=0.2 2024-08-10 08:24:01,382 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 3.079e+01 3.425e+01 3.855e+01 5.495e+01, threshold=6.850e+01, percent-clipped=0.0 2024-08-10 08:24:01,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=460970.0, ans=0.0 2024-08-10 08:24:22,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=461070.0, ans=0.125 2024-08-10 08:24:28,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=461070.0, ans=0.125 2024-08-10 08:24:30,255 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 12 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-10 08:24:34,886 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 08:24:44,170 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 18 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 08:24:58,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=461170.0, ans=0.125 2024-08-10 08:25:03,265 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2650, loss[loss=0.08495, beats_loss=0.01241, ecapa_loss=0.0002902, whisper_loss=0.06964, over 13559.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01203, ecapa_loss=0.0002765, whisper_loss=0.09727, over 3882948.75 frames. ], batch size: 54, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:25:49,544 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 08:26:34,972 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-10 08:26:41,043 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.23 vs. limit=15.0 2024-08-10 08:26:41,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=461670.0, ans=0.1 2024-08-10 08:26:47,031 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 08:26:55,649 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.85 vs. limit=15.0 2024-08-10 08:26:57,677 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2700, loss[loss=0.09558, beats_loss=0.01169, ecapa_loss=0.0002697, whisper_loss=0.0812, over 16047.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.0122, ecapa_loss=0.0002763, whisper_loss=0.0966, over 3888779.21 frames. ], batch size: 65, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:27:02,032 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 08:27:18,048 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.64 vs. limit=22.5 2024-08-10 08:27:33,097 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=15.0 2024-08-10 08:27:42,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=461970.0, ans=0.1 2024-08-10 08:27:45,151 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 08:27:48,696 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.370e+01 3.222e+01 3.601e+01 4.234e+01 3.838e+02, threshold=7.201e+01, percent-clipped=7.0 2024-08-10 08:27:58,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=462070.0, ans=0.125 2024-08-10 08:28:11,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=462070.0, ans=0.025 2024-08-10 08:28:17,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=462170.0, ans=0.125 2024-08-10 08:28:24,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=462170.0, ans=0.125 2024-08-10 08:28:28,826 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 08:28:30,389 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2750, loss[loss=0.1319, beats_loss=0.009632, ecapa_loss=0.0002837, whisper_loss=0.1194, over 17829.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01222, ecapa_loss=0.0002767, whisper_loss=0.09617, over 3864449.20 frames. ], batch size: 66, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:28:46,672 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 08:29:05,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=462470.0, ans=0.1 2024-08-10 08:29:14,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=462570.0, ans=0.125 2024-08-10 08:29:19,343 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.58 vs. limit=10.0 2024-08-10 08:29:31,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=462670.0, ans=0.125 2024-08-10 08:29:40,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=462670.0, ans=0.2 2024-08-10 08:29:43,623 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 08:29:45,945 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2800, loss[loss=0.1157, beats_loss=0.01191, ecapa_loss=0.0003559, whisper_loss=0.1002, over 21448.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01221, ecapa_loss=0.0002763, whisper_loss=0.09647, over 3876034.64 frames. ], batch size: 92, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:29:49,837 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.601e-01 2024-08-10 08:29:53,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=462770.0, ans=0.0 2024-08-10 08:29:55,035 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 08:29:58,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=462770.0, ans=0.2 2024-08-10 08:30:00,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=462870.0, ans=0.2 2024-08-10 08:30:17,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=462970.0, ans=0.0 2024-08-10 08:30:19,950 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 3.197e+01 3.685e+01 4.218e+01 5.823e+01, threshold=7.371e+01, percent-clipped=0.0 2024-08-10 08:30:20,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=462970.0, ans=0.125 2024-08-10 08:30:24,786 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2024-08-10 08:30:25,613 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 08:30:26,983 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.64 vs. limit=22.5 2024-08-10 08:30:30,097 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.60 vs. limit=15.0 2024-08-10 08:30:49,955 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-10 08:30:54,943 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 26 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 08:31:01,050 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2850, loss[loss=0.1358, beats_loss=0.009542, ecapa_loss=0.0002447, whisper_loss=0.1238, over 15937.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01224, ecapa_loss=0.0002733, whisper_loss=0.09728, over 3864027.82 frames. ], batch size: 57, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:31:02,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=463270.0, ans=0.2 2024-08-10 08:31:15,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=463370.0, ans=0.0 2024-08-10 08:31:16,099 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 21 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-10 08:31:16,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=463370.0, ans=0.0 2024-08-10 08:31:18,761 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2024-08-10 08:31:31,108 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-10 08:31:34,688 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2024-08-10 08:31:52,794 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 08:32:08,143 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.87 vs. limit=6.0 2024-08-10 08:32:09,945 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=15.0 2024-08-10 08:32:24,073 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2900, loss[loss=0.09557, beats_loss=0.01145, ecapa_loss=0.0003316, whisper_loss=0.08081, over 14906.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.0123, ecapa_loss=0.0002761, whisper_loss=0.09732, over 3862686.67 frames. ], batch size: 62, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:32:31,834 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 08:32:45,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=463870.0, ans=0.0 2024-08-10 08:33:05,891 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.474e+01 3.004e+01 3.404e+01 3.788e+01 1.422e+02, threshold=6.807e+01, percent-clipped=1.0 2024-08-10 08:33:06,111 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-10 08:33:20,920 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.84 vs. limit=6.0 2024-08-10 08:33:30,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=464070.0, ans=0.125 2024-08-10 08:33:30,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=464070.0, ans=10.0 2024-08-10 08:33:40,109 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.90 vs. limit=22.5 2024-08-10 08:33:55,626 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 2950, loss[loss=0.1142, beats_loss=0.01236, ecapa_loss=0.0002691, whisper_loss=0.09915, over 20728.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01226, ecapa_loss=0.0002775, whisper_loss=0.09775, over 3864253.63 frames. ], batch size: 82, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:34:11,235 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.135e-02 2024-08-10 08:34:29,218 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-10 08:34:35,808 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 08:34:52,549 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-10 08:34:54,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=464570.0, ans=0.125 2024-08-10 08:34:59,707 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-10 08:35:01,461 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 28 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 08:35:05,918 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2024-08-10 08:35:10,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=464670.0, ans=0.2 2024-08-10 08:35:13,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=464670.0, ans=0.07 2024-08-10 08:35:27,857 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3000, loss[loss=0.08518, beats_loss=0.0151, ecapa_loss=0.0002335, whisper_loss=0.06774, over 14452.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.01225, ecapa_loss=0.0002772, whisper_loss=0.09786, over 3884686.83 frames. ], batch size: 59, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:35:27,858 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-10 08:36:05,664 INFO [train_multi_KD3.py:1149] (2/4) Epoch 4, validation on ASR_libri: loss=0.2648, beats_loss=0, ecapa_loss=0.0008316, whisper_loss=0.2565, over 922467.00 frames. 2024-08-10 08:36:23,265 INFO [train_multi_KD3.py:1149] (2/4) Epoch 4, validation on SV_voxceleb1: loss=0.007277, beats_loss=0, ecapa_loss=0.0007277, whisper_loss=0, over 939242.00 frames. 2024-08-10 08:37:01,465 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.5670, 1.4070, 1.6740, 1.0736, 2.1542, 1.3897, 1.4931, 1.6002], device='cuda:2') 2024-08-10 08:37:57,866 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1075, 4.6997, 4.1235, 4.5514], device='cuda:2') 2024-08-10 08:38:19,752 INFO [train_multi_KD3.py:1149] (2/4) Epoch 4, validation on AT_audioset: loss=0.0279, beats_loss=0.0279, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 08:38:19,756 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 08:38:37,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=464870.0, ans=0.1 2024-08-10 08:38:57,308 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 3.167e+01 3.615e+01 4.298e+01 8.066e+01, threshold=7.230e+01, percent-clipped=1.0 2024-08-10 08:39:02,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=464970.0, ans=0.125 2024-08-10 08:39:13,926 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 08:39:17,802 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 08:39:33,151 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-10 08:39:35,883 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.53 vs. limit=15.0 2024-08-10 08:39:40,932 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3050, loss[loss=0.1114, beats_loss=0.01406, ecapa_loss=0.0002443, whisper_loss=0.09491, over 23104.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01221, ecapa_loss=0.0002781, whisper_loss=0.09819, over 3915374.52 frames. ], batch size: 90, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:39:46,516 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 08:40:08,711 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.37 vs. limit=15.0 2024-08-10 08:40:19,082 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-10 08:40:25,633 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 08:40:30,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=465570.0, ans=0.125 2024-08-10 08:40:34,108 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 35 from Vox, 25 fro AS 2024-08-10 08:40:45,610 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 08:41:03,697 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3100, loss[loss=0.1093, beats_loss=0.01061, ecapa_loss=0.0003043, whisper_loss=0.09562, over 15082.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01217, ecapa_loss=0.0002809, whisper_loss=0.0981, over 3904144.93 frames. ], batch size: 60, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:41:10,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=465770.0, ans=0.125 2024-08-10 08:41:34,353 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.28 vs. limit=22.5 2024-08-10 08:41:43,768 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 3.398e+01 3.878e+01 4.582e+01 1.719e+02, threshold=7.756e+01, percent-clipped=2.0 2024-08-10 08:41:49,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=465970.0, ans=0.025 2024-08-10 08:42:09,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=466070.0, ans=0.0 2024-08-10 08:42:33,424 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3150, loss[loss=0.1364, beats_loss=0.01181, ecapa_loss=0.0002593, whisper_loss=0.122, over 21841.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.0123, ecapa_loss=0.0002802, whisper_loss=0.09797, over 3898854.11 frames. ], batch size: 86, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:42:35,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=466270.0, ans=0.2 2024-08-10 08:42:45,445 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=15.0 2024-08-10 08:43:01,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=466370.0, ans=0.0 2024-08-10 08:43:15,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=466470.0, ans=0.0 2024-08-10 08:43:17,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=466470.0, ans=0.125 2024-08-10 08:43:24,505 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 08:43:24,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=466570.0, ans=0.0 2024-08-10 08:43:40,222 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-10 08:43:47,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=466670.0, ans=0.125 2024-08-10 08:43:47,974 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.14 vs. limit=15.0 2024-08-10 08:43:58,710 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3200, loss[loss=0.1096, beats_loss=0.01306, ecapa_loss=0.0002639, whisper_loss=0.09388, over 23557.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01228, ecapa_loss=0.0002789, whisper_loss=0.09743, over 3893162.72 frames. ], batch size: 96, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:44:00,235 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-10 08:44:02,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=466770.0, ans=0.125 2024-08-10 08:44:11,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=466770.0, ans=0.05 2024-08-10 08:44:24,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=466870.0, ans=0.125 2024-08-10 08:44:30,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=466870.0, ans=0.1 2024-08-10 08:44:40,700 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 3.101e+01 3.705e+01 4.309e+01 1.166e+02, threshold=7.411e+01, percent-clipped=1.0 2024-08-10 08:44:55,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=467070.0, ans=0.125 2024-08-10 08:45:03,194 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 08:45:12,097 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.15 vs. limit=15.0 2024-08-10 08:45:32,871 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3250, loss[loss=0.1057, beats_loss=0.009883, ecapa_loss=0.000309, whisper_loss=0.09273, over 20355.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01224, ecapa_loss=0.00028, whisper_loss=0.09759, over 3895223.27 frames. ], batch size: 78, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:45:48,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=467270.0, ans=0.0 2024-08-10 08:46:02,449 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 08:46:28,896 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 08:46:29,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=467570.0, ans=0.125 2024-08-10 08:46:30,605 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 08:46:50,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=467670.0, ans=0.125 2024-08-10 08:46:59,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=467670.0, ans=0.125 2024-08-10 08:47:04,033 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3300, loss[loss=0.1033, beats_loss=0.011, ecapa_loss=0.0003156, whisper_loss=0.0891, over 20626.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.0122, ecapa_loss=0.0002802, whisper_loss=0.09772, over 3906585.55 frames. ], batch size: 87, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:47:24,679 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 08:47:30,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=467870.0, ans=0.0 2024-08-10 08:47:40,428 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=15.0 2024-08-10 08:47:46,892 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.467e+01 3.041e+01 3.344e+01 3.812e+01 6.169e+01, threshold=6.688e+01, percent-clipped=0.0 2024-08-10 08:47:52,078 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 08:48:21,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=468170.0, ans=0.125 2024-08-10 08:48:23,758 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 26 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-10 08:48:25,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=468170.0, ans=0.1 2024-08-10 08:48:34,249 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3350, loss[loss=0.1053, beats_loss=0.01061, ecapa_loss=0.0002373, whisper_loss=0.09232, over 23590.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01216, ecapa_loss=0.0002791, whisper_loss=0.09709, over 3907613.49 frames. ], batch size: 93, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:48:34,395 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 24 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-10 08:48:48,680 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 16 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-10 08:48:56,751 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 26 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-10 08:49:03,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=468370.0, ans=0.125 2024-08-10 08:49:14,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=468470.0, ans=0.1 2024-08-10 08:49:30,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=468570.0, ans=0.125 2024-08-10 08:49:58,916 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3400, loss[loss=0.05511, beats_loss=0.0134, ecapa_loss=0.0002621, whisper_loss=0.03908, over 14536.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01218, ecapa_loss=0.0002763, whisper_loss=0.09673, over 3903243.78 frames. ], batch size: 55, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:50:03,081 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.47 vs. limit=5.0 2024-08-10 08:50:11,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=468770.0, ans=0.05 2024-08-10 08:50:32,343 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.63 vs. limit=15.0 2024-08-10 08:50:34,196 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.324e+01 3.156e+01 3.587e+01 4.181e+01 1.855e+02, threshold=7.174e+01, percent-clipped=2.0 2024-08-10 08:50:34,335 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 08:50:49,860 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 08:50:52,963 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 22 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-10 08:50:53,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=469070.0, ans=0.0 2024-08-10 08:50:53,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=469070.0, ans=0.125 2024-08-10 08:51:08,551 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2024-08-10 08:51:13,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=469170.0, ans=0.1 2024-08-10 08:51:13,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=469170.0, ans=10.0 2024-08-10 08:51:15,951 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3450, loss[loss=0.1139, beats_loss=0.01167, ecapa_loss=0.0002741, whisper_loss=0.09946, over 16982.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01215, ecapa_loss=0.0002794, whisper_loss=0.09701, over 3899619.24 frames. ], batch size: 66, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:51:22,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=469270.0, ans=0.0 2024-08-10 08:51:27,994 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.87 vs. limit=10.0 2024-08-10 08:51:36,803 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.74 vs. limit=15.0 2024-08-10 08:51:37,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=469370.0, ans=0.125 2024-08-10 08:52:05,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=469570.0, ans=0.125 2024-08-10 08:52:29,770 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3500, loss[loss=0.1026, beats_loss=0.01179, ecapa_loss=0.0002871, whisper_loss=0.08791, over 22433.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01214, ecapa_loss=0.0002787, whisper_loss=0.09673, over 3890986.82 frames. ], batch size: 89, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:52:29,914 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 08:52:33,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=469770.0, ans=0.125 2024-08-10 08:52:54,206 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 12 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 08:52:54,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=469870.0, ans=0.1 2024-08-10 08:52:58,847 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=4.236e-02 2024-08-10 08:53:04,079 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 3.037e+01 3.390e+01 3.981e+01 6.541e+01, threshold=6.780e+01, percent-clipped=0.0 2024-08-10 08:53:09,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=469970.0, ans=0.0 2024-08-10 08:53:14,961 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 08:53:16,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=470070.0, ans=0.125 2024-08-10 08:53:24,598 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.22 vs. limit=10.0 2024-08-10 08:53:35,248 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 08:53:44,571 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3550, loss[loss=0.06549, beats_loss=0.01654, ecapa_loss=0.000184, whisper_loss=0.04711, over 14899.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01214, ecapa_loss=0.0002783, whisper_loss=0.0963, over 3868314.43 frames. ], batch size: 57, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:53:46,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=470270.0, ans=0.125 2024-08-10 08:54:01,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=470370.0, ans=0.125 2024-08-10 08:54:07,661 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 08:54:15,993 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.22 vs. limit=22.5 2024-08-10 08:54:33,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=470570.0, ans=0.125 2024-08-10 08:54:34,645 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 08:54:45,741 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 08:54:48,605 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-10 08:54:50,545 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.95 vs. limit=15.0 2024-08-10 08:54:54,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=470670.0, ans=0.0 2024-08-10 08:54:57,308 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3600, loss[loss=0.1234, beats_loss=0.01161, ecapa_loss=0.0002312, whisper_loss=0.1095, over 19431.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01212, ecapa_loss=0.0002771, whisper_loss=0.09697, over 3884734.09 frames. ], batch size: 73, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:54:59,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=470770.0, ans=0.07 2024-08-10 08:55:16,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=470870.0, ans=0.125 2024-08-10 08:55:29,119 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.993e+01 3.332e+01 3.946e+01 5.463e+01, threshold=6.665e+01, percent-clipped=0.0 2024-08-10 08:55:53,905 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-10 08:55:54,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=471070.0, ans=0.125 2024-08-10 08:55:55,910 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.66 vs. limit=15.0 2024-08-10 08:56:01,919 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 08:56:10,609 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3650, loss[loss=0.1143, beats_loss=0.01155, ecapa_loss=0.0002998, whisper_loss=0.09975, over 18957.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01211, ecapa_loss=0.0002783, whisper_loss=0.09715, over 3860995.55 frames. ], batch size: 77, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 08:56:22,291 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-10 08:56:24,723 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 08:56:28,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=471370.0, ans=0.0 2024-08-10 08:56:29,776 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2024-08-10 08:56:33,349 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 08:56:41,981 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2024-08-10 08:57:04,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=471570.0, ans=0.125 2024-08-10 08:57:07,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=471670.0, ans=0.125 2024-08-10 08:57:08,034 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 08:57:13,109 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 08:57:20,479 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3700, loss[loss=0.1286, beats_loss=0.01017, ecapa_loss=0.0002915, whisper_loss=0.1155, over 18139.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01211, ecapa_loss=0.0002797, whisper_loss=0.09745, over 3865229.08 frames. ], batch size: 70, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 08:57:28,614 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 41 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-10 08:57:28,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=471770.0, ans=0.1 2024-08-10 08:57:32,853 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2024-08-10 08:57:42,081 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 27 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-10 08:57:42,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=471870.0, ans=0.125 2024-08-10 08:57:50,641 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 08:57:51,901 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 3.070e+01 3.607e+01 4.290e+01 1.526e+02, threshold=7.214e+01, percent-clipped=4.0 2024-08-10 08:57:56,008 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 08:58:19,738 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-10 08:58:26,712 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.50 vs. limit=22.5 2024-08-10 08:58:27,359 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3750, loss[loss=0.1067, beats_loss=0.01404, ecapa_loss=0.0002349, whisper_loss=0.09035, over 14454.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01224, ecapa_loss=0.0002795, whisper_loss=0.09741, over 3877818.58 frames. ], batch size: 54, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 08:58:36,204 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 08:58:53,229 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.90 vs. limit=22.5 2024-08-10 08:58:55,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=472370.0, ans=0.125 2024-08-10 08:59:00,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=472470.0, ans=0.125 2024-08-10 08:59:08,793 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.20 vs. limit=10.0 2024-08-10 08:59:25,707 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 08:59:40,992 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3800, loss[loss=0.1123, beats_loss=0.009661, ecapa_loss=0.0003491, whisper_loss=0.0992, over 16523.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.0123, ecapa_loss=0.0002795, whisper_loss=0.09754, over 3879652.37 frames. ], batch size: 70, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:00:08,719 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-10 09:00:13,796 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 3.142e+01 3.387e+01 4.333e+01 6.143e+01, threshold=6.774e+01, percent-clipped=0.0 2024-08-10 09:00:45,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=473170.0, ans=0.125 2024-08-10 09:00:51,183 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-10 09:00:52,256 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3850, loss[loss=0.1221, beats_loss=0.01444, ecapa_loss=0.0002526, whisper_loss=0.1052, over 23822.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01232, ecapa_loss=0.0002803, whisper_loss=0.09726, over 3894792.24 frames. ], batch size: 94, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:00:53,709 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-10 09:01:15,121 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-10 09:01:32,152 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=18.83 vs. limit=15.0 2024-08-10 09:01:42,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=473570.0, ans=0.125 2024-08-10 09:01:43,679 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 29 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-10 09:02:04,740 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3900, loss[loss=0.1199, beats_loss=0.01199, ecapa_loss=0.000289, whisper_loss=0.105, over 22573.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01237, ecapa_loss=0.0002806, whisper_loss=0.09734, over 3904245.75 frames. ], batch size: 89, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:02:10,645 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 09:02:14,969 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 09:02:16,831 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 09:02:22,357 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-10 09:02:22,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=473870.0, ans=0.0 2024-08-10 09:02:31,151 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 36 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 09:02:38,661 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.399e+01 3.153e+01 3.691e+01 4.376e+01 6.503e+01, threshold=7.382e+01, percent-clipped=0.0 2024-08-10 09:02:55,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=474070.0, ans=0.1 2024-08-10 09:03:01,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=474070.0, ans=0.125 2024-08-10 09:03:03,384 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.26 vs. limit=6.0 2024-08-10 09:03:05,521 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 09:03:06,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=474170.0, ans=0.1 2024-08-10 09:03:17,559 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 3950, loss[loss=0.119, beats_loss=0.01218, ecapa_loss=0.0003181, whisper_loss=0.1037, over 21943.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.01233, ecapa_loss=0.000283, whisper_loss=0.09773, over 3936007.35 frames. ], batch size: 92, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:03:22,378 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 31 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-10 09:03:29,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=474270.0, ans=0.125 2024-08-10 09:03:33,271 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-10 09:03:36,701 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 19 from LS+wenet, 29 from Vox, 44 fro AS 2024-08-10 09:03:49,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=474470.0, ans=0.125 2024-08-10 09:04:11,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=474570.0, ans=10.0 2024-08-10 09:04:27,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=474770.0, ans=0.125 2024-08-10 09:04:28,843 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4000, loss[loss=0.09464, beats_loss=0.01145, ecapa_loss=0.0003839, whisper_loss=0.07936, over 15151.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01232, ecapa_loss=0.0002814, whisper_loss=0.09752, over 3920925.59 frames. ], batch size: 64, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:04:43,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=474870.0, ans=0.125 2024-08-10 09:05:02,619 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+01 3.204e+01 3.613e+01 4.111e+01 7.755e+01, threshold=7.226e+01, percent-clipped=1.0 2024-08-10 09:05:14,648 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 17 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 09:05:17,312 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 09:05:17,613 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=6.091e-01 2024-08-10 09:05:18,667 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-10 09:05:35,105 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.73 vs. limit=15.0 2024-08-10 09:05:44,063 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4050, loss[loss=0.1048, beats_loss=0.01048, ecapa_loss=0.0002682, whisper_loss=0.09164, over 22873.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01231, ecapa_loss=0.0002802, whisper_loss=0.09692, over 3934914.86 frames. ], batch size: 88, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:05:53,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=475270.0, ans=0.2 2024-08-10 09:06:10,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=475370.0, ans=0.1 2024-08-10 09:06:20,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=475470.0, ans=0.2 2024-08-10 09:06:30,952 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.58 vs. limit=22.5 2024-08-10 09:06:50,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=475670.0, ans=0.125 2024-08-10 09:06:55,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=475670.0, ans=0.0 2024-08-10 09:06:57,610 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4100, loss[loss=0.08061, beats_loss=0.01578, ecapa_loss=0.0002977, whisper_loss=0.06186, over 15067.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01228, ecapa_loss=0.0002798, whisper_loss=0.0964, over 3881312.93 frames. ], batch size: 64, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:07:02,830 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 09:07:14,988 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 17 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 09:07:23,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=475870.0, ans=0.125 2024-08-10 09:07:34,206 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.441e+01 3.046e+01 3.447e+01 3.852e+01 5.765e+01, threshold=6.895e+01, percent-clipped=0.0 2024-08-10 09:07:36,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=475970.0, ans=0.0 2024-08-10 09:08:00,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=476170.0, ans=0.125 2024-08-10 09:08:07,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=476170.0, ans=0.07 2024-08-10 09:08:12,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476170.0, ans=0.1 2024-08-10 09:08:14,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=476270.0, ans=0.125 2024-08-10 09:08:15,269 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4150, loss[loss=0.1229, beats_loss=0.01174, ecapa_loss=0.0002776, whisper_loss=0.1084, over 16728.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01224, ecapa_loss=0.000282, whisper_loss=0.09655, over 3885336.91 frames. ], batch size: 67, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:08:17,585 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 09:08:21,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=476270.0, ans=0.1 2024-08-10 09:08:43,516 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-10 09:08:55,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=476470.0, ans=0.125 2024-08-10 09:09:01,939 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 09:09:32,049 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-10 09:09:38,084 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4200, loss[loss=0.1272, beats_loss=0.01364, ecapa_loss=0.0002726, whisper_loss=0.1109, over 19933.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01223, ecapa_loss=0.0002812, whisper_loss=0.09698, over 3878525.83 frames. ], batch size: 81, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:09:41,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=476770.0, ans=0.125 2024-08-10 09:09:55,397 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 39 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 09:09:57,132 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 09:09:58,409 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-10 09:10:02,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=476870.0, ans=0.0 2024-08-10 09:10:12,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=476970.0, ans=0.2 2024-08-10 09:10:12,934 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.087e+01 3.164e+01 3.633e+01 4.360e+01 6.348e+01, threshold=7.265e+01, percent-clipped=0.0 2024-08-10 09:10:34,367 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 09:10:42,076 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 09:10:44,205 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-10 09:10:46,043 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 09:10:49,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=477170.0, ans=0.125 2024-08-10 09:10:56,679 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4250, loss[loss=0.1298, beats_loss=0.01212, ecapa_loss=0.000233, whisper_loss=0.1153, over 23781.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01215, ecapa_loss=0.0002807, whisper_loss=0.09703, over 3864270.97 frames. ], batch size: 91, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:10:58,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=477270.0, ans=0.2 2024-08-10 09:11:19,043 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2024-08-10 09:11:21,174 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 09:11:33,047 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=15.0 2024-08-10 09:11:40,467 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.22 vs. limit=22.5 2024-08-10 09:11:44,320 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 09:11:46,134 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-10 09:12:02,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=477670.0, ans=0.125 2024-08-10 09:12:07,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=477670.0, ans=0.07 2024-08-10 09:12:08,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=477670.0, ans=0.0 2024-08-10 09:12:11,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=477670.0, ans=0.1 2024-08-10 09:12:16,169 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4300, loss[loss=0.1465, beats_loss=0.009528, ecapa_loss=0.0002601, whisper_loss=0.1344, over 22903.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01208, ecapa_loss=0.0002781, whisper_loss=0.09722, over 3856427.39 frames. ], batch size: 85, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:12:22,856 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.71 vs. limit=6.0 2024-08-10 09:12:35,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=477870.0, ans=0.0 2024-08-10 09:12:36,435 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 09:12:54,980 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.896e+01 3.194e+01 3.711e+01 5.609e+01, threshold=6.388e+01, percent-clipped=0.0 2024-08-10 09:13:02,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=477970.0, ans=0.2 2024-08-10 09:13:05,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=478070.0, ans=0.0 2024-08-10 09:13:10,922 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 09:13:23,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=478170.0, ans=0.2 2024-08-10 09:13:25,686 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2024-08-10 09:13:34,365 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4350, loss[loss=0.1241, beats_loss=0.01227, ecapa_loss=0.000276, whisper_loss=0.1091, over 19972.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01208, ecapa_loss=0.0002797, whisper_loss=0.09677, over 3847796.52 frames. ], batch size: 78, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:13:34,477 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 30 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 09:13:37,783 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 09:14:08,437 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 23 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 09:14:22,110 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-10 09:14:23,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=478570.0, ans=0.125 2024-08-10 09:14:29,039 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 09:14:30,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=478570.0, ans=0.0 2024-08-10 09:14:36,021 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-10 09:14:36,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=478670.0, ans=0.125 2024-08-10 09:14:41,106 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2024-08-10 09:14:42,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=478670.0, ans=0.0 2024-08-10 09:14:48,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=478670.0, ans=0.1 2024-08-10 09:14:51,292 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4400, loss[loss=0.1355, beats_loss=0.0113, ecapa_loss=0.0002456, whisper_loss=0.1218, over 23158.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01205, ecapa_loss=0.0002779, whisper_loss=0.09697, over 3838796.26 frames. ], batch size: 89, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:15:24,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=478970.0, ans=0.05 2024-08-10 09:15:25,502 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.527e+01 3.041e+01 3.447e+01 3.976e+01 9.860e+01, threshold=6.894e+01, percent-clipped=1.0 2024-08-10 09:15:28,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=478970.0, ans=0.0 2024-08-10 09:15:38,397 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 09:15:45,885 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.371e-01 2024-08-10 09:16:04,213 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4450, loss[loss=0.1135, beats_loss=0.008199, ecapa_loss=0.0003456, whisper_loss=0.1018, over 13426.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01192, ecapa_loss=0.0002809, whisper_loss=0.09791, over 3844050.99 frames. ], batch size: 55, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:16:12,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=479270.0, ans=0.0 2024-08-10 09:16:28,119 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 09:16:35,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=479470.0, ans=0.0 2024-08-10 09:16:42,248 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 09:16:56,409 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=12.0 2024-08-10 09:17:11,955 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4500, loss[loss=0.1161, beats_loss=0.01186, ecapa_loss=0.0002942, whisper_loss=0.1013, over 18211.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01197, ecapa_loss=0.0002789, whisper_loss=0.09762, over 3844296.16 frames. ], batch size: 73, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:17:12,145 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 16 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 09:17:34,501 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2024-08-10 09:17:44,009 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.468e+01 3.224e+01 3.675e+01 4.252e+01 6.669e+01, threshold=7.350e+01, percent-clipped=1.0 2024-08-10 09:17:44,169 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 09:17:59,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=480070.0, ans=0.1 2024-08-10 09:18:05,307 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 12 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 09:18:15,040 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-10 09:18:20,210 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4550, loss[loss=0.09797, beats_loss=0.01474, ecapa_loss=0.000311, whisper_loss=0.08012, over 20522.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01212, ecapa_loss=0.0002803, whisper_loss=0.09589, over 3844725.30 frames. ], batch size: 90, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:18:25,945 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 09:18:30,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=480270.0, ans=0.125 2024-08-10 09:18:36,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=480370.0, ans=0.0 2024-08-10 09:18:45,393 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.76 vs. limit=6.0 2024-08-10 09:18:56,056 INFO [train_multi_KD3.py:844] (2/4) A total of 97 cuts. 25 from LS+wenet, 36 from Vox, 36 fro AS 2024-08-10 09:19:05,073 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 32 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-10 09:19:09,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=480570.0, ans=0.125 2024-08-10 09:19:11,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=480570.0, ans=0.0 2024-08-10 09:19:13,454 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 09:19:27,564 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4600, loss[loss=0.1178, beats_loss=0.01001, ecapa_loss=0.0002851, whisper_loss=0.1049, over 22841.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01216, ecapa_loss=0.0002783, whisper_loss=0.09612, over 3887294.62 frames. ], batch size: 92, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:19:31,565 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 09:19:34,111 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 09:19:39,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=480870.0, ans=0.0 2024-08-10 09:19:56,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=480970.0, ans=0.1 2024-08-10 09:19:56,965 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+01 3.144e+01 3.622e+01 4.296e+01 6.398e+01, threshold=7.244e+01, percent-clipped=0.0 2024-08-10 09:20:01,522 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-10 09:20:12,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=481070.0, ans=0.2 2024-08-10 09:20:13,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=481070.0, ans=0.125 2024-08-10 09:20:20,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=481170.0, ans=0.1 2024-08-10 09:20:32,927 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4650, loss[loss=0.1252, beats_loss=0.01051, ecapa_loss=0.0003018, whisper_loss=0.1117, over 19332.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01229, ecapa_loss=0.0002764, whisper_loss=0.09595, over 3881488.68 frames. ], batch size: 76, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:20:53,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=481370.0, ans=0.125 2024-08-10 09:21:01,335 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 09:21:18,741 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.20 vs. limit=6.0 2024-08-10 09:21:30,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=481670.0, ans=0.05 2024-08-10 09:21:35,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=481670.0, ans=0.2 2024-08-10 09:21:37,231 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4700, loss[loss=0.1374, beats_loss=0.0109, ecapa_loss=0.0002536, whisper_loss=0.124, over 21929.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.0123, ecapa_loss=0.0002777, whisper_loss=0.09569, over 3889303.87 frames. ], batch size: 85, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:22:07,304 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.494e+01 3.095e+01 3.461e+01 3.864e+01 6.358e+01, threshold=6.922e+01, percent-clipped=0.0 2024-08-10 09:22:25,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=482070.0, ans=0.125 2024-08-10 09:22:42,265 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4750, loss[loss=0.1144, beats_loss=0.01109, ecapa_loss=0.0003198, whisper_loss=0.1001, over 21680.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01219, ecapa_loss=0.0002771, whisper_loss=0.09658, over 3895732.10 frames. ], batch size: 90, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:22:46,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=482270.0, ans=0.0 2024-08-10 09:23:12,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=482470.0, ans=0.125 2024-08-10 09:23:13,008 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 09:23:16,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=482470.0, ans=0.125 2024-08-10 09:23:19,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=482570.0, ans=0.125 2024-08-10 09:23:20,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=482570.0, ans=10.0 2024-08-10 09:23:31,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=482570.0, ans=0.2 2024-08-10 09:23:45,983 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4800, loss[loss=0.1193, beats_loss=0.0127, ecapa_loss=0.0002758, whisper_loss=0.1038, over 23058.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01224, ecapa_loss=0.0002756, whisper_loss=0.09708, over 3900192.82 frames. ], batch size: 93, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:23:46,126 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 09:24:04,421 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2024-08-10 09:24:05,625 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.01 vs. limit=15.0 2024-08-10 09:24:06,891 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2024-08-10 09:24:14,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=482970.0, ans=0.1 2024-08-10 09:24:14,806 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.601e+01 3.085e+01 3.419e+01 4.209e+01 9.011e+01, threshold=6.838e+01, percent-clipped=2.0 2024-08-10 09:24:27,112 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=15.0 2024-08-10 09:24:32,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=483070.0, ans=0.0 2024-08-10 09:24:38,130 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2024-08-10 09:24:46,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=483170.0, ans=0.0 2024-08-10 09:24:49,352 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4850, loss[loss=0.1212, beats_loss=0.01204, ecapa_loss=0.0003118, whisper_loss=0.1061, over 21571.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01222, ecapa_loss=0.0002769, whisper_loss=0.09703, over 3890499.14 frames. ], batch size: 89, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:24:49,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=483270.0, ans=0.0 2024-08-10 09:25:05,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=483370.0, ans=0.2 2024-08-10 09:25:07,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=483370.0, ans=0.2 2024-08-10 09:25:13,165 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 09:25:23,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=483470.0, ans=0.125 2024-08-10 09:25:26,065 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 09:25:30,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=483570.0, ans=0.07 2024-08-10 09:25:43,009 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 09:25:47,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=483670.0, ans=0.0 2024-08-10 09:25:48,763 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-10 09:26:02,262 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4900, loss[loss=0.1257, beats_loss=0.01086, ecapa_loss=0.0003289, whisper_loss=0.1116, over 23180.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01223, ecapa_loss=0.0002766, whisper_loss=0.09694, over 3901821.83 frames. ], batch size: 93, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:26:02,414 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-10 09:26:08,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=483770.0, ans=0.125 2024-08-10 09:26:30,619 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 09:26:32,111 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 9 from Vox, 32 fro AS 2024-08-10 09:26:39,212 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+01 3.191e+01 3.639e+01 4.118e+01 6.849e+01, threshold=7.278e+01, percent-clipped=1.0 2024-08-10 09:26:50,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=484070.0, ans=0.0 2024-08-10 09:27:26,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=484170.0, ans=0.035 2024-08-10 09:27:29,585 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 4950, loss[loss=0.09451, beats_loss=0.01155, ecapa_loss=0.0003086, whisper_loss=0.07988, over 16124.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01232, ecapa_loss=0.0002754, whisper_loss=0.09655, over 3910311.48 frames. ], batch size: 67, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:27:36,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=484270.0, ans=0.125 2024-08-10 09:27:41,575 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.15 vs. limit=22.5 2024-08-10 09:27:46,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2024-08-10 09:27:49,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=484370.0, ans=0.125 2024-08-10 09:28:05,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=484370.0, ans=0.125 2024-08-10 09:28:07,637 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 09:28:39,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=484570.0, ans=0.0 2024-08-10 09:29:06,252 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5000, loss[loss=0.1063, beats_loss=0.01368, ecapa_loss=0.0002685, whisper_loss=0.0899, over 22386.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01231, ecapa_loss=0.0002725, whisper_loss=0.09694, over 3911107.20 frames. ], batch size: 93, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:29:21,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=484770.0, ans=0.125 2024-08-10 09:29:22,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=484770.0, ans=0.125 2024-08-10 09:29:24,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=484870.0, ans=0.125 2024-08-10 09:29:37,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=484870.0, ans=0.0 2024-08-10 09:29:52,084 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.456e+01 3.034e+01 3.424e+01 4.085e+01 5.403e+01, threshold=6.848e+01, percent-clipped=0.0 2024-08-10 09:29:58,640 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 09:30:14,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=485070.0, ans=0.2 2024-08-10 09:30:31,150 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 09:30:35,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=485170.0, ans=0.125 2024-08-10 09:30:44,492 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5050, loss[loss=0.0946, beats_loss=0.01475, ecapa_loss=0.0002071, whisper_loss=0.07778, over 14755.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01222, ecapa_loss=0.0002734, whisper_loss=0.09755, over 3893431.14 frames. ], batch size: 56, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:30:44,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=485270.0, ans=0.1 2024-08-10 09:31:05,563 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-10 09:31:12,624 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=19.33 vs. limit=15.0 2024-08-10 09:31:43,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=485570.0, ans=0.125 2024-08-10 09:31:48,629 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 09:32:08,639 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 09:32:14,911 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-10 09:32:15,183 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 09:32:16,052 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5100, loss[loss=0.1089, beats_loss=0.01265, ecapa_loss=0.0002873, whisper_loss=0.0934, over 13658.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01226, ecapa_loss=0.0002743, whisper_loss=0.09671, over 3909058.68 frames. ], batch size: 55, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:32:20,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=485770.0, ans=0.125 2024-08-10 09:32:29,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=485870.0, ans=0.1 2024-08-10 09:32:32,041 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.98 vs. limit=10.0 2024-08-10 09:32:37,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=485870.0, ans=0.2 2024-08-10 09:32:37,978 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 19 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-10 09:32:45,200 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.495e+01 3.245e+01 3.767e+01 4.403e+01 1.091e+02, threshold=7.533e+01, percent-clipped=4.0 2024-08-10 09:33:03,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=486070.0, ans=0.0 2024-08-10 09:33:05,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=486070.0, ans=0.125 2024-08-10 09:33:14,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=486170.0, ans=0.125 2024-08-10 09:33:20,280 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5150, loss[loss=0.1156, beats_loss=0.01355, ecapa_loss=0.0002082, whisper_loss=0.09995, over 18805.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01224, ecapa_loss=0.0002751, whisper_loss=0.09701, over 3904580.81 frames. ], batch size: 73, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:33:24,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=486270.0, ans=0.035 2024-08-10 09:33:29,017 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-10 09:33:40,296 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 09:33:41,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-10 09:33:58,713 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2024-08-10 09:34:13,795 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=12.0 2024-08-10 09:34:19,878 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.34 vs. limit=10.0 2024-08-10 09:34:22,753 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5200, loss[loss=0.1127, beats_loss=0.01039, ecapa_loss=0.0003137, whisper_loss=0.09914, over 19308.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01218, ecapa_loss=0.0002744, whisper_loss=0.09756, over 3918659.52 frames. ], batch size: 77, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:34:25,361 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 09:34:25,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=486770.0, ans=0.5 2024-08-10 09:34:33,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=486770.0, ans=0.1 2024-08-10 09:34:45,161 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.62 vs. limit=12.0 2024-08-10 09:34:50,920 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-10 09:34:51,984 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 3.041e+01 3.408e+01 4.043e+01 9.843e+01, threshold=6.816e+01, percent-clipped=1.0 2024-08-10 09:34:56,150 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 09:35:15,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=487170.0, ans=0.125 2024-08-10 09:35:25,907 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5250, loss[loss=0.1094, beats_loss=0.01131, ecapa_loss=0.0002911, whisper_loss=0.09519, over 15624.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01216, ecapa_loss=0.000274, whisper_loss=0.09743, over 3889851.13 frames. ], batch size: 62, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:35:30,225 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.965e-02 2024-08-10 09:35:37,098 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=15.0 2024-08-10 09:35:45,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=487370.0, ans=0.0 2024-08-10 09:35:54,266 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 09:35:59,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=487470.0, ans=0.125 2024-08-10 09:35:59,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=487470.0, ans=0.125 2024-08-10 09:35:59,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=487470.0, ans=15.0 2024-08-10 09:36:03,443 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.55 vs. limit=15.0 2024-08-10 09:36:06,542 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 27 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 09:36:12,042 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 09:36:12,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=487570.0, ans=0.0 2024-08-10 09:36:14,639 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-10 09:36:29,520 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5300, loss[loss=0.1171, beats_loss=0.01377, ecapa_loss=0.0002569, whisper_loss=0.1007, over 23957.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01214, ecapa_loss=0.0002732, whisper_loss=0.09791, over 3885176.94 frames. ], batch size: 93, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:36:49,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=487870.0, ans=0.0 2024-08-10 09:36:58,666 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+01 3.142e+01 3.530e+01 4.338e+01 6.802e+01, threshold=7.061e+01, percent-clipped=0.0 2024-08-10 09:36:58,879 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 32 from Vox, 25 fro AS 2024-08-10 09:37:03,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=487970.0, ans=0.2 2024-08-10 09:37:28,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=488170.0, ans=0.125 2024-08-10 09:37:28,696 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.36 vs. limit=15.0 2024-08-10 09:37:33,256 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5350, loss[loss=0.1339, beats_loss=0.009828, ecapa_loss=0.0002507, whisper_loss=0.1215, over 23538.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01216, ecapa_loss=0.0002706, whisper_loss=0.09715, over 3877032.61 frames. ], batch size: 88, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:37:38,539 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 09:37:46,635 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.10 vs. limit=15.0 2024-08-10 09:38:00,534 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.44 vs. limit=15.0 2024-08-10 09:38:18,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=488570.0, ans=0.0 2024-08-10 09:38:25,541 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 09:38:33,135 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 09:38:36,671 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5400, loss[loss=0.1064, beats_loss=0.01385, ecapa_loss=0.0002593, whisper_loss=0.08993, over 21908.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01202, ecapa_loss=0.0002711, whisper_loss=0.09752, over 3859492.09 frames. ], batch size: 89, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:38:49,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=488870.0, ans=0.125 2024-08-10 09:39:01,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=488970.0, ans=0.0 2024-08-10 09:39:03,110 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 09:39:05,537 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.881e+01 3.134e+01 3.602e+01 5.252e+01, threshold=6.268e+01, percent-clipped=0.0 2024-08-10 09:39:07,452 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2024-08-10 09:39:07,585 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.76 vs. limit=15.0 2024-08-10 09:39:15,707 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 36 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 09:39:27,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=489170.0, ans=0.2 2024-08-10 09:39:28,557 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 42 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-10 09:39:31,927 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.29 vs. limit=15.0 2024-08-10 09:39:39,752 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5450, loss[loss=0.1227, beats_loss=0.0105, ecapa_loss=0.0002468, whisper_loss=0.1097, over 20221.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01203, ecapa_loss=0.0002708, whisper_loss=0.09737, over 3862072.36 frames. ], batch size: 76, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:39:49,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=489270.0, ans=0.05 2024-08-10 09:39:50,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=489270.0, ans=0.125 2024-08-10 09:39:59,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=489370.0, ans=0.09899494936611666 2024-08-10 09:40:11,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.16 vs. limit=22.5 2024-08-10 09:40:15,299 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 09:40:18,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=489570.0, ans=0.125 2024-08-10 09:40:23,100 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-10 09:40:26,467 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.35 vs. limit=15.0 2024-08-10 09:40:32,239 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 09:40:43,662 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5500, loss[loss=0.08986, beats_loss=0.01595, ecapa_loss=0.0002073, whisper_loss=0.07183, over 17315.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.0121, ecapa_loss=0.0002707, whisper_loss=0.09726, over 3879658.31 frames. ], batch size: 69, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:40:46,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=489770.0, ans=0.125 2024-08-10 09:41:00,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=489870.0, ans=0.0 2024-08-10 09:41:01,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=489870.0, ans=0.125 2024-08-10 09:41:07,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=489970.0, ans=0.1 2024-08-10 09:41:12,348 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 3.173e+01 3.591e+01 4.081e+01 1.350e+02, threshold=7.183e+01, percent-clipped=1.0 2024-08-10 09:41:14,071 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 09:41:14,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=489970.0, ans=0.07 2024-08-10 09:41:15,650 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.714e-01 2024-08-10 09:41:17,481 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.43 vs. limit=12.0 2024-08-10 09:41:21,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=490070.0, ans=0.0 2024-08-10 09:41:26,277 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 09:41:32,453 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 09:41:34,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=490170.0, ans=0.125 2024-08-10 09:41:47,826 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5550, loss[loss=0.1234, beats_loss=0.01005, ecapa_loss=0.000296, whisper_loss=0.1104, over 22466.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01205, ecapa_loss=0.0002725, whisper_loss=0.09737, over 3884600.97 frames. ], batch size: 90, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:41:49,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=490270.0, ans=0.125 2024-08-10 09:41:53,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=490270.0, ans=0.0 2024-08-10 09:42:07,385 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 09:42:16,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=490470.0, ans=0.1 2024-08-10 09:42:28,820 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 09:42:30,055 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 09:42:39,090 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 09:42:47,193 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2024-08-10 09:42:51,083 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5600, loss[loss=0.1117, beats_loss=0.01235, ecapa_loss=0.000288, whisper_loss=0.09644, over 22110.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01216, ecapa_loss=0.0002719, whisper_loss=0.09673, over 3876302.17 frames. ], batch size: 91, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:42:53,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=490770.0, ans=0.125 2024-08-10 09:43:03,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=490870.0, ans=0.2 2024-08-10 09:43:04,064 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 09:43:16,501 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 09:43:20,406 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+01 3.015e+01 3.404e+01 4.297e+01 6.726e+01, threshold=6.809e+01, percent-clipped=0.0 2024-08-10 09:43:46,786 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 09:43:48,025 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-10 09:43:49,922 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2024-08-10 09:43:49,996 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=15.0 2024-08-10 09:43:55,742 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5650, loss[loss=0.1289, beats_loss=0.01292, ecapa_loss=0.0001923, whisper_loss=0.114, over 18413.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01224, ecapa_loss=0.0002722, whisper_loss=0.09602, over 3906008.65 frames. ], batch size: 71, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:44:05,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=491270.0, ans=0.0 2024-08-10 09:44:16,308 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-10 09:44:25,856 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-08-10 09:44:29,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=491470.0, ans=0.0 2024-08-10 09:44:31,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=491470.0, ans=0.07 2024-08-10 09:44:39,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=491570.0, ans=0.95 2024-08-10 09:44:44,338 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 09:44:59,322 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5700, loss[loss=0.1129, beats_loss=0.01108, ecapa_loss=0.0002441, whisper_loss=0.0994, over 14606.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01214, ecapa_loss=0.0002739, whisper_loss=0.09663, over 3905771.07 frames. ], batch size: 58, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:45:00,830 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 09:45:08,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=491770.0, ans=0.05 2024-08-10 09:45:09,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=491770.0, ans=0.125 2024-08-10 09:45:31,955 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 09:45:33,069 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.196e+01 3.076e+01 3.438e+01 4.149e+01 8.224e+01, threshold=6.876e+01, percent-clipped=3.0 2024-08-10 09:46:00,872 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-10 09:46:01,478 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.34 vs. limit=10.0 2024-08-10 09:46:09,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=492170.0, ans=0.125 2024-08-10 09:46:14,582 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5750, loss[loss=0.09219, beats_loss=0.01315, ecapa_loss=0.0001735, whisper_loss=0.07731, over 18026.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.0121, ecapa_loss=0.0002754, whisper_loss=0.0963, over 3899615.18 frames. ], batch size: 66, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:46:14,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=492270.0, ans=0.0 2024-08-10 09:46:33,678 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 09:46:35,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=492370.0, ans=0.125 2024-08-10 09:46:49,371 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 09:46:51,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=492470.0, ans=0.125 2024-08-10 09:46:54,351 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-10 09:46:54,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=492470.0, ans=0.0 2024-08-10 09:47:00,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=492470.0, ans=0.0 2024-08-10 09:47:07,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=492570.0, ans=0.0 2024-08-10 09:47:09,729 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2024-08-10 09:47:18,229 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 09:47:18,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=492670.0, ans=0.0 2024-08-10 09:47:29,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=492670.0, ans=0.125 2024-08-10 09:47:34,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=492670.0, ans=0.1 2024-08-10 09:47:37,989 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5800, loss[loss=0.1026, beats_loss=0.01238, ecapa_loss=0.0002999, whisper_loss=0.08721, over 14523.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01225, ecapa_loss=0.0002743, whisper_loss=0.09526, over 3881289.93 frames. ], batch size: 58, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:47:43,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=492770.0, ans=0.125 2024-08-10 09:47:51,488 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 09:47:59,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492870.0, ans=0.1 2024-08-10 09:48:11,709 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.363e+01 3.153e+01 3.469e+01 4.030e+01 1.339e+02, threshold=6.938e+01, percent-clipped=1.0 2024-08-10 09:48:22,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=493070.0, ans=0.125 2024-08-10 09:48:23,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=493070.0, ans=0.1 2024-08-10 09:48:36,905 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.24 vs. limit=15.0 2024-08-10 09:48:40,293 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 09:48:40,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=493170.0, ans=0.0 2024-08-10 09:48:47,831 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5850, loss[loss=0.1245, beats_loss=0.008365, ecapa_loss=0.0002703, whisper_loss=0.1135, over 16002.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01228, ecapa_loss=0.0002739, whisper_loss=0.09503, over 3881107.69 frames. ], batch size: 58, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:48:48,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=493270.0, ans=15.0 2024-08-10 09:48:49,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=493270.0, ans=0.125 2024-08-10 09:48:53,216 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 09:49:02,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=493370.0, ans=0.05 2024-08-10 09:49:12,517 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.548e-03 2024-08-10 09:49:29,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=493570.0, ans=0.125 2024-08-10 09:49:38,427 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 09:49:38,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=493670.0, ans=0.125 2024-08-10 09:49:49,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=493670.0, ans=0.1 2024-08-10 09:49:51,355 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5900, loss[loss=0.1212, beats_loss=0.01014, ecapa_loss=0.000299, whisper_loss=0.108, over 16346.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01215, ecapa_loss=0.0002747, whisper_loss=0.09536, over 3862425.56 frames. ], batch size: 64, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:49:51,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=493770.0, ans=0.125 2024-08-10 09:50:02,717 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 09:50:05,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=493870.0, ans=0.0 2024-08-10 09:50:15,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=493970.0, ans=0.2 2024-08-10 09:50:15,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=493970.0, ans=0.1 2024-08-10 09:50:20,483 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.415e+01 2.982e+01 3.256e+01 3.844e+01 1.503e+02, threshold=6.513e+01, percent-clipped=1.0 2024-08-10 09:50:28,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=494070.0, ans=0.0 2024-08-10 09:50:38,545 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-10 09:50:40,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=494070.0, ans=0.0 2024-08-10 09:50:46,307 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-10 09:50:52,442 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 09:50:54,622 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 5950, loss[loss=0.1088, beats_loss=0.01114, ecapa_loss=0.0002672, whisper_loss=0.09496, over 16398.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01223, ecapa_loss=0.0002744, whisper_loss=0.09527, over 3896863.65 frames. ], batch size: 65, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:51:17,929 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 09:51:18,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=494370.0, ans=0.125 2024-08-10 09:51:41,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=494570.0, ans=0.0 2024-08-10 09:51:47,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=494670.0, ans=0.125 2024-08-10 09:51:58,749 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6000, loss[loss=0.09728, beats_loss=0.01565, ecapa_loss=0.0001923, whisper_loss=0.07971, over 17510.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01223, ecapa_loss=0.0002715, whisper_loss=0.09606, over 3888637.57 frames. ], batch size: 69, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:51:58,750 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-10 09:52:39,962 INFO [train_multi_KD3.py:1149] (2/4) Epoch 4, validation on ASR_libri: loss=0.2669, beats_loss=0, ecapa_loss=0.0008114, whisper_loss=0.2588, over 922467.00 frames. 2024-08-10 09:52:55,579 INFO [train_multi_KD3.py:1149] (2/4) Epoch 4, validation on SV_voxceleb1: loss=0.00707, beats_loss=0, ecapa_loss=0.000707, whisper_loss=0, over 939242.00 frames. 2024-08-10 09:54:53,726 INFO [train_multi_KD3.py:1149] (2/4) Epoch 4, validation on AT_audioset: loss=0.028, beats_loss=0.028, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 09:54:53,730 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 09:54:55,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=494770.0, ans=0.125 2024-08-10 09:55:00,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=494770.0, ans=0.0 2024-08-10 09:55:04,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=494770.0, ans=0.2 2024-08-10 09:55:07,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=494870.0, ans=0.0 2024-08-10 09:55:15,346 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 11 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-10 09:55:23,065 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.994e+01 3.624e+01 4.180e+01 6.998e+01, threshold=7.249e+01, percent-clipped=2.0 2024-08-10 09:55:27,081 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 09:55:27,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=494970.0, ans=0.1 2024-08-10 09:55:31,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=495070.0, ans=0.125 2024-08-10 09:55:40,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=495070.0, ans=0.125 2024-08-10 09:55:48,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=495170.0, ans=0.07 2024-08-10 09:55:54,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=495170.0, ans=0.07 2024-08-10 09:55:58,194 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6050, loss[loss=0.1058, beats_loss=0.01601, ecapa_loss=0.0002238, whisper_loss=0.08753, over 17544.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01219, ecapa_loss=0.0002705, whisper_loss=0.09585, over 3866511.67 frames. ], batch size: 71, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:56:12,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=495370.0, ans=0.125 2024-08-10 09:56:20,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=495370.0, ans=0.1 2024-08-10 09:56:53,626 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 09:57:02,737 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6100, loss[loss=0.1386, beats_loss=0.009266, ecapa_loss=0.0003175, whisper_loss=0.1261, over 22441.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01208, ecapa_loss=0.0002727, whisper_loss=0.0968, over 3891086.43 frames. ], batch size: 90, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:57:03,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=495770.0, ans=0.1 2024-08-10 09:57:15,400 INFO [train_multi_KD3.py:844] (2/4) A total of 98 cuts. 28 from LS+wenet, 37 from Vox, 33 fro AS 2024-08-10 09:57:26,955 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.15 vs. limit=22.5 2024-08-10 09:57:31,667 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.767e+01 3.161e+01 3.682e+01 7.056e+01, threshold=6.321e+01, percent-clipped=0.0 2024-08-10 09:57:51,453 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-10 09:57:52,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=496170.0, ans=0.0 2024-08-10 09:57:55,158 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 09:58:06,827 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6150, loss[loss=0.1274, beats_loss=0.01167, ecapa_loss=0.0002747, whisper_loss=0.113, over 17455.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01204, ecapa_loss=0.000273, whisper_loss=0.09684, over 3889762.00 frames. ], batch size: 70, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:58:12,309 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 28 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 09:58:34,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=496470.0, ans=0.0 2024-08-10 09:58:44,145 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 09:58:48,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=496570.0, ans=0.1 2024-08-10 09:58:58,439 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-10 09:59:00,953 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 31 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 09:59:10,052 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-08-10 09:59:10,781 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6200, loss[loss=0.1007, beats_loss=0.01052, ecapa_loss=0.0003272, whisper_loss=0.08686, over 14410.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01212, ecapa_loss=0.0002715, whisper_loss=0.09666, over 3889504.87 frames. ], batch size: 59, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 09:59:12,211 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 09:59:16,104 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 34 from Vox, 27 fro AS 2024-08-10 09:59:39,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=496970.0, ans=15.0 2024-08-10 09:59:40,767 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.342e+01 3.143e+01 3.568e+01 4.018e+01 6.093e+01, threshold=7.137e+01, percent-clipped=0.0 2024-08-10 09:59:43,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=496970.0, ans=0.0 2024-08-10 09:59:52,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=497070.0, ans=0.125 2024-08-10 09:59:54,140 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 09:59:54,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=497070.0, ans=0.2 2024-08-10 10:00:16,276 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6250, loss[loss=0.1218, beats_loss=0.01123, ecapa_loss=0.0002479, whisper_loss=0.1081, over 23484.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01215, ecapa_loss=0.0002717, whisper_loss=0.09557, over 3884778.28 frames. ], batch size: 90, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:00:21,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=497270.0, ans=0.0 2024-08-10 10:00:25,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=497270.0, ans=0.125 2024-08-10 10:00:33,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=497370.0, ans=0.2 2024-08-10 10:00:40,500 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-10 10:00:41,851 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 10:00:48,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=497470.0, ans=0.125 2024-08-10 10:00:54,687 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 10:01:20,902 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6300, loss[loss=0.1226, beats_loss=0.009957, ecapa_loss=0.0002724, whisper_loss=0.1099, over 22422.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01212, ecapa_loss=0.0002695, whisper_loss=0.09605, over 3894195.18 frames. ], batch size: 88, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:01:23,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=497770.0, ans=0.0 2024-08-10 10:01:25,355 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2024-08-10 10:01:31,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=497770.0, ans=0.125 2024-08-10 10:01:38,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=497870.0, ans=0.125 2024-08-10 10:01:45,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=497970.0, ans=0.1 2024-08-10 10:01:50,440 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.372e+01 3.096e+01 3.544e+01 4.139e+01 6.723e+01, threshold=7.089e+01, percent-clipped=0.0 2024-08-10 10:02:05,280 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 10:02:18,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=498170.0, ans=0.05 2024-08-10 10:02:25,833 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6350, loss[loss=0.09638, beats_loss=0.01518, ecapa_loss=0.0002261, whisper_loss=0.07895, over 18328.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01217, ecapa_loss=0.0002712, whisper_loss=0.0959, over 3879725.83 frames. ], batch size: 76, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:02:33,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=498270.0, ans=0.125 2024-08-10 10:02:39,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.11 vs. limit=12.0 2024-08-10 10:02:49,953 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 10:03:06,384 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.74 vs. limit=22.5 2024-08-10 10:03:17,069 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 10:03:19,631 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-10 10:03:28,119 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2024-08-10 10:03:29,895 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6400, loss[loss=0.1135, beats_loss=0.01036, ecapa_loss=0.0003581, whisper_loss=0.09959, over 21742.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01225, ecapa_loss=0.0002713, whisper_loss=0.09574, over 3863927.26 frames. ], batch size: 92, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:03:37,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=498770.0, ans=0.2 2024-08-10 10:03:42,091 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.75 vs. limit=12.0 2024-08-10 10:03:42,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=498870.0, ans=0.015 2024-08-10 10:03:51,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=498870.0, ans=0.0 2024-08-10 10:04:01,640 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+01 3.035e+01 3.531e+01 4.097e+01 5.944e+01, threshold=7.062e+01, percent-clipped=0.0 2024-08-10 10:04:07,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=498970.0, ans=10.0 2024-08-10 10:04:17,923 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.56 vs. limit=6.0 2024-08-10 10:04:38,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=499170.0, ans=0.0 2024-08-10 10:04:43,291 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6450, loss[loss=0.1229, beats_loss=0.009506, ecapa_loss=0.0003404, whisper_loss=0.11, over 18182.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01219, ecapa_loss=0.0002717, whisper_loss=0.09666, over 3876128.45 frames. ], batch size: 73, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:04:53,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=499270.0, ans=0.0 2024-08-10 10:04:54,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=499270.0, ans=0.2 2024-08-10 10:04:54,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=499270.0, ans=0.0 2024-08-10 10:04:56,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=499270.0, ans=0.035 2024-08-10 10:04:56,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=499270.0, ans=0.2 2024-08-10 10:04:58,259 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-10 10:05:58,902 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6500, loss[loss=0.1346, beats_loss=0.009059, ecapa_loss=0.0002557, whisper_loss=0.123, over 18254.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01213, ecapa_loss=0.0002736, whisper_loss=0.09668, over 3895255.42 frames. ], batch size: 70, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:06:14,734 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 10:06:16,344 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-10 10:06:19,643 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 10:06:28,382 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 10:06:30,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=499970.0, ans=0.5 2024-08-10 10:06:33,924 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 3.134e+01 3.492e+01 3.881e+01 6.321e+01, threshold=6.984e+01, percent-clipped=0.0 2024-08-10 10:06:53,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=500070.0, ans=0.04949747468305833 2024-08-10 10:06:53,204 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.041e-02 2024-08-10 10:07:11,827 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 25 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 10:07:15,592 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6550, loss[loss=0.09722, beats_loss=0.01399, ecapa_loss=0.0002858, whisper_loss=0.08037, over 22181.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01215, ecapa_loss=0.0002716, whisper_loss=0.0972, over 3912068.77 frames. ], batch size: 90, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:07:30,796 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-10 10:07:49,714 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 10:07:55,712 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-10 10:08:06,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=500570.0, ans=22.5 2024-08-10 10:08:06,579 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2024-08-10 10:08:19,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=500570.0, ans=0.04949747468305833 2024-08-10 10:08:27,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=500670.0, ans=0.125 2024-08-10 10:08:32,150 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 10:08:41,456 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6600, loss[loss=0.119, beats_loss=0.01249, ecapa_loss=0.0002364, whisper_loss=0.1042, over 21972.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01208, ecapa_loss=0.0002729, whisper_loss=0.0977, over 3933501.64 frames. ], batch size: 86, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:08:42,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=500770.0, ans=0.0 2024-08-10 10:08:46,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=500770.0, ans=0.125 2024-08-10 10:08:49,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=500770.0, ans=0.2 2024-08-10 10:08:56,622 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 10:08:59,689 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 30 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-10 10:09:13,026 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=15.0 2024-08-10 10:09:18,530 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.333e+01 3.113e+01 3.580e+01 3.995e+01 6.180e+01, threshold=7.160e+01, percent-clipped=0.0 2024-08-10 10:09:34,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=501070.0, ans=0.125 2024-08-10 10:10:00,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6650, loss[loss=0.1142, beats_loss=0.01119, ecapa_loss=0.0002726, whisper_loss=0.1003, over 19460.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01205, ecapa_loss=0.0002723, whisper_loss=0.09726, over 3937297.25 frames. ], batch size: 75, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:10:24,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=501370.0, ans=0.125 2024-08-10 10:10:27,231 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 10:10:30,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=501370.0, ans=0.0 2024-08-10 10:10:37,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=501470.0, ans=0.1 2024-08-10 10:10:38,748 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.99 vs. limit=22.5 2024-08-10 10:11:01,103 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-10 10:11:01,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=501570.0, ans=0.125 2024-08-10 10:11:04,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=501670.0, ans=0.1 2024-08-10 10:11:16,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=501670.0, ans=0.07 2024-08-10 10:11:18,749 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.62 vs. limit=6.0 2024-08-10 10:11:20,645 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 10:11:21,929 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6700, loss[loss=0.1383, beats_loss=0.0106, ecapa_loss=0.0002214, whisper_loss=0.1255, over 25069.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01207, ecapa_loss=0.0002713, whisper_loss=0.09758, over 3942460.44 frames. ], batch size: 92, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:11:55,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=501970.0, ans=0.0 2024-08-10 10:12:00,460 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.467e+01 2.966e+01 3.489e+01 3.963e+01 6.232e+01, threshold=6.977e+01, percent-clipped=0.0 2024-08-10 10:12:01,342 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2024-08-10 10:12:04,136 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 10:12:07,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=501970.0, ans=0.125 2024-08-10 10:12:21,447 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 10:12:26,326 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 10:12:32,423 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-10 10:12:35,751 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.53 vs. limit=10.0 2024-08-10 10:12:40,919 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 10:12:45,846 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6750, loss[loss=0.1228, beats_loss=0.01146, ecapa_loss=0.0002982, whisper_loss=0.1083, over 16615.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01208, ecapa_loss=0.0002699, whisper_loss=0.09817, over 3894321.59 frames. ], batch size: 66, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:13:23,258 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-10 10:13:23,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=502470.0, ans=0.125 2024-08-10 10:13:30,530 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 31 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 10:13:49,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=502570.0, ans=0.125 2024-08-10 10:14:01,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=502670.0, ans=0.1 2024-08-10 10:14:11,162 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6800, loss[loss=0.08608, beats_loss=0.01491, ecapa_loss=0.0002533, whisper_loss=0.06864, over 18380.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01205, ecapa_loss=0.0002709, whisper_loss=0.09822, over 3895938.85 frames. ], batch size: 77, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:14:34,553 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 10:14:50,967 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 3.003e+01 3.545e+01 4.063e+01 8.445e+01, threshold=7.089e+01, percent-clipped=2.0 2024-08-10 10:15:02,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=503070.0, ans=0.125 2024-08-10 10:15:05,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=503070.0, ans=0.0 2024-08-10 10:15:22,106 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 10:15:34,605 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-10 10:15:35,992 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6850, loss[loss=0.1132, beats_loss=0.01441, ecapa_loss=0.0002097, whisper_loss=0.09673, over 23653.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01215, ecapa_loss=0.0002696, whisper_loss=0.09748, over 3901288.05 frames. ], batch size: 93, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:16:09,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=503470.0, ans=0.1 2024-08-10 10:16:10,867 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 10:16:18,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=503470.0, ans=0.125 2024-08-10 10:16:26,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=503570.0, ans=0.0 2024-08-10 10:16:42,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=503670.0, ans=0.125 2024-08-10 10:16:54,219 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6900, loss[loss=0.1133, beats_loss=0.01264, ecapa_loss=0.0002086, whisper_loss=0.09855, over 18245.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01219, ecapa_loss=0.0002671, whisper_loss=0.09695, over 3904895.31 frames. ], batch size: 69, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:16:54,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=503770.0, ans=0.2 2024-08-10 10:17:02,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=503770.0, ans=12.0 2024-08-10 10:17:09,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=503870.0, ans=0.125 2024-08-10 10:17:21,497 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 10:17:30,421 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 3.010e+01 3.385e+01 3.920e+01 6.674e+01, threshold=6.771e+01, percent-clipped=0.0 2024-08-10 10:17:37,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=503970.0, ans=0.5 2024-08-10 10:17:40,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=504070.0, ans=0.125 2024-08-10 10:17:54,288 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-10 10:17:54,540 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.21 vs. limit=15.0 2024-08-10 10:18:04,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=504170.0, ans=0.0 2024-08-10 10:18:14,586 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 6950, loss[loss=0.1089, beats_loss=0.01231, ecapa_loss=0.0002705, whisper_loss=0.09388, over 22145.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01213, ecapa_loss=0.0002672, whisper_loss=0.09753, over 3890936.64 frames. ], batch size: 90, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:18:51,261 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.83 vs. limit=10.0 2024-08-10 10:18:56,458 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2024-08-10 10:19:10,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=504570.0, ans=0.125 2024-08-10 10:19:18,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=504570.0, ans=0.125 2024-08-10 10:19:31,718 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 10:19:33,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=504670.0, ans=0.1 2024-08-10 10:19:36,454 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7000, loss[loss=0.09928, beats_loss=0.01324, ecapa_loss=0.0002901, whisper_loss=0.08314, over 14870.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01216, ecapa_loss=0.0002667, whisper_loss=0.09726, over 3922168.04 frames. ], batch size: 60, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:19:37,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=504770.0, ans=15.0 2024-08-10 10:19:53,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=504870.0, ans=0.125 2024-08-10 10:20:01,040 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=15.0 2024-08-10 10:20:06,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=504970.0, ans=0.0 2024-08-10 10:20:08,060 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 10:20:12,826 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.871e+01 3.202e+01 3.824e+01 7.169e+01, threshold=6.405e+01, percent-clipped=1.0 2024-08-10 10:20:20,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=504970.0, ans=10.0 2024-08-10 10:20:21,670 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-10 10:20:21,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=504970.0, ans=0.1 2024-08-10 10:20:24,831 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 10:20:40,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=505170.0, ans=0.125 2024-08-10 10:20:41,899 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 10:20:49,933 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 10:20:57,964 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7050, loss[loss=0.1151, beats_loss=0.008327, ecapa_loss=0.0002942, whisper_loss=0.1039, over 15793.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.0121, ecapa_loss=0.0002671, whisper_loss=0.09772, over 3927115.10 frames. ], batch size: 61, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:21:03,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=505270.0, ans=0.125 2024-08-10 10:21:03,272 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-10 10:21:11,155 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.17 vs. limit=10.0 2024-08-10 10:21:15,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=505370.0, ans=0.125 2024-08-10 10:21:27,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=505470.0, ans=0.125 2024-08-10 10:21:33,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=505470.0, ans=0.125 2024-08-10 10:21:41,730 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 10:21:43,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=505570.0, ans=0.1 2024-08-10 10:21:45,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=505570.0, ans=0.2 2024-08-10 10:22:02,872 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 13 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 10:22:16,464 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7100, loss[loss=0.1311, beats_loss=0.01122, ecapa_loss=0.0002623, whisper_loss=0.1172, over 21800.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.0121, ecapa_loss=0.0002664, whisper_loss=0.09769, over 3935954.22 frames. ], batch size: 85, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:22:34,670 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.25 vs. limit=22.5 2024-08-10 10:22:36,595 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.43 vs. limit=15.0 2024-08-10 10:22:53,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=505970.0, ans=0.0 2024-08-10 10:22:54,502 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+01 3.041e+01 3.472e+01 4.120e+01 8.517e+01, threshold=6.943e+01, percent-clipped=2.0 2024-08-10 10:22:56,363 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 10:23:05,871 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.74 vs. limit=10.0 2024-08-10 10:23:09,411 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.85 vs. limit=22.5 2024-08-10 10:23:11,478 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 10:23:16,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=506070.0, ans=0.125 2024-08-10 10:23:25,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=506170.0, ans=0.1 2024-08-10 10:23:26,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=506170.0, ans=0.125 2024-08-10 10:23:36,845 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7150, loss[loss=0.123, beats_loss=0.01005, ecapa_loss=0.0003072, whisper_loss=0.1099, over 17932.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01205, ecapa_loss=0.0002682, whisper_loss=0.09733, over 3907809.37 frames. ], batch size: 69, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:23:57,429 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.52 vs. limit=6.0 2024-08-10 10:24:00,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=506370.0, ans=0.125 2024-08-10 10:24:11,634 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-10 10:24:11,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=506470.0, ans=0.125 2024-08-10 10:24:13,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=506470.0, ans=0.125 2024-08-10 10:24:15,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=506470.0, ans=0.0 2024-08-10 10:24:35,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=506570.0, ans=0.125 2024-08-10 10:24:47,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=506670.0, ans=0.125 2024-08-10 10:24:54,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=506770.0, ans=0.125 2024-08-10 10:24:55,520 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7200, loss[loss=0.1018, beats_loss=0.0119, ecapa_loss=0.0002708, whisper_loss=0.08718, over 17108.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01206, ecapa_loss=0.0002684, whisper_loss=0.09697, over 3920620.83 frames. ], batch size: 67, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:24:57,703 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.58 vs. limit=22.5 2024-08-10 10:24:58,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=506770.0, ans=0.0 2024-08-10 10:25:03,747 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.21 vs. limit=22.5 2024-08-10 10:25:11,993 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 10:25:14,075 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.79 vs. limit=15.0 2024-08-10 10:25:35,732 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.350e+01 3.179e+01 3.637e+01 4.087e+01 6.923e+01, threshold=7.273e+01, percent-clipped=0.0 2024-08-10 10:25:45,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=507070.0, ans=0.1 2024-08-10 10:25:51,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=507070.0, ans=0.09899494936611666 2024-08-10 10:26:12,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=507170.0, ans=0.125 2024-08-10 10:26:18,299 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7250, loss[loss=0.0962, beats_loss=0.01536, ecapa_loss=0.0002723, whisper_loss=0.07812, over 21383.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01206, ecapa_loss=0.0002658, whisper_loss=0.09705, over 3906943.35 frames. ], batch size: 91, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:26:19,145 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.32 vs. limit=15.0 2024-08-10 10:26:36,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=507370.0, ans=0.1 2024-08-10 10:26:48,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=507470.0, ans=0.125 2024-08-10 10:26:58,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=507470.0, ans=0.0 2024-08-10 10:27:23,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=507670.0, ans=0.125 2024-08-10 10:27:37,448 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7300, loss[loss=0.1174, beats_loss=0.01203, ecapa_loss=0.0002447, whisper_loss=0.1029, over 17199.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01203, ecapa_loss=0.0002681, whisper_loss=0.09731, over 3874212.49 frames. ], batch size: 67, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:27:39,370 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 10:27:41,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2024-08-10 10:27:44,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=507770.0, ans=0.1 2024-08-10 10:28:02,941 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 10:28:04,907 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 23 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-10 10:28:16,760 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.965e+01 3.375e+01 3.820e+01 5.473e+01, threshold=6.750e+01, percent-clipped=0.0 2024-08-10 10:28:26,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=508070.0, ans=0.125 2024-08-10 10:28:59,657 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7350, loss[loss=0.1219, beats_loss=0.008926, ecapa_loss=0.0003337, whisper_loss=0.1096, over 16142.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01206, ecapa_loss=0.0002696, whisper_loss=0.09747, over 3892788.22 frames. ], batch size: 63, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:29:05,459 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 23 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-10 10:29:28,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=508370.0, ans=0.04949747468305833 2024-08-10 10:29:47,559 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-10 10:29:49,663 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 10:30:07,554 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-10 10:30:08,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=508670.0, ans=0.125 2024-08-10 10:30:26,678 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7400, loss[loss=0.114, beats_loss=0.01566, ecapa_loss=0.0002043, whisper_loss=0.09635, over 20914.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01206, ecapa_loss=0.0002682, whisper_loss=0.09741, over 3862272.12 frames. ], batch size: 82, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:30:31,329 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.67 vs. limit=22.5 2024-08-10 10:31:05,939 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.310e+01 2.905e+01 3.226e+01 3.755e+01 5.990e+01, threshold=6.451e+01, percent-clipped=0.0 2024-08-10 10:31:21,811 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 10:31:31,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=509070.0, ans=0.125 2024-08-10 10:31:33,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=509170.0, ans=0.125 2024-08-10 10:31:33,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=509170.0, ans=0.0 2024-08-10 10:31:47,473 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 10:31:52,393 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7450, loss[loss=0.09384, beats_loss=0.01256, ecapa_loss=0.0003371, whisper_loss=0.07791, over 14500.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01206, ecapa_loss=0.0002682, whisper_loss=0.09734, over 3858435.17 frames. ], batch size: 63, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:31:53,367 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2024-08-10 10:31:54,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=509270.0, ans=0.0 2024-08-10 10:32:04,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=509270.0, ans=0.05 2024-08-10 10:32:15,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=509370.0, ans=0.125 2024-08-10 10:32:40,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=509470.0, ans=0.95 2024-08-10 10:32:45,105 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 10:32:55,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=509570.0, ans=0.125 2024-08-10 10:32:57,598 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.93 vs. limit=22.5 2024-08-10 10:33:11,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=509670.0, ans=0.0 2024-08-10 10:33:18,620 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7500, loss[loss=0.1291, beats_loss=0.01109, ecapa_loss=0.0002734, whisper_loss=0.1153, over 19100.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01196, ecapa_loss=0.0002696, whisper_loss=0.09798, over 3856512.28 frames. ], batch size: 78, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:33:23,003 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 10:33:30,113 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.17 vs. limit=10.0 2024-08-10 10:33:43,219 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-08-10 10:33:58,002 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+01 3.154e+01 3.513e+01 4.160e+01 5.952e+01, threshold=7.025e+01, percent-clipped=0.0 2024-08-10 10:34:02,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=509970.0, ans=0.125 2024-08-10 10:34:06,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=509970.0, ans=0.0 2024-08-10 10:34:07,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=509970.0, ans=0.0 2024-08-10 10:34:29,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=510170.0, ans=0.2 2024-08-10 10:34:30,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=510170.0, ans=0.0 2024-08-10 10:34:34,117 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-10 10:34:43,771 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7550, loss[loss=0.1308, beats_loss=0.009501, ecapa_loss=0.0003443, whisper_loss=0.1178, over 23052.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01202, ecapa_loss=0.0002693, whisper_loss=0.0971, over 3850918.51 frames. ], batch size: 96, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:34:48,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=510270.0, ans=0.125 2024-08-10 10:34:52,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=510270.0, ans=22.5 2024-08-10 10:34:55,190 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-10 10:35:04,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=510370.0, ans=0.2 2024-08-10 10:35:08,345 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-10 10:35:23,715 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 10:35:33,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=510570.0, ans=0.125 2024-08-10 10:35:40,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=510570.0, ans=0.125 2024-08-10 10:35:45,782 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 10:36:03,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=510670.0, ans=0.125 2024-08-10 10:36:07,460 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7600, loss[loss=0.1053, beats_loss=0.009255, ecapa_loss=0.0002942, whisper_loss=0.09311, over 22850.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01206, ecapa_loss=0.000269, whisper_loss=0.09599, over 3842101.90 frames. ], batch size: 93, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:36:13,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=510770.0, ans=0.2 2024-08-10 10:36:17,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=510770.0, ans=0.5 2024-08-10 10:36:19,432 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 10:36:43,243 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 10:36:45,918 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 2.819e+01 3.165e+01 3.521e+01 5.971e+01, threshold=6.331e+01, percent-clipped=0.0 2024-08-10 10:36:50,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=510970.0, ans=0.0 2024-08-10 10:36:50,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=510970.0, ans=0.0 2024-08-10 10:37:01,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=511070.0, ans=0.1 2024-08-10 10:37:03,581 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 43 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 10:37:18,178 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 10:37:32,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=511270.0, ans=0.05 2024-08-10 10:37:34,280 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7650, loss[loss=0.1085, beats_loss=0.0141, ecapa_loss=0.0002353, whisper_loss=0.09205, over 23004.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.012, ecapa_loss=0.0002687, whisper_loss=0.09668, over 3851929.01 frames. ], batch size: 91, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:37:36,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=511270.0, ans=0.0 2024-08-10 10:37:39,461 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 20 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-10 10:37:43,986 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.14 vs. limit=22.5 2024-08-10 10:37:48,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=511270.0, ans=0.125 2024-08-10 10:37:51,844 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 10:38:32,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=12.0 2024-08-10 10:38:50,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=511670.0, ans=0.125 2024-08-10 10:38:58,908 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7700, loss[loss=0.1227, beats_loss=0.0107, ecapa_loss=0.0002463, whisper_loss=0.1096, over 23199.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01193, ecapa_loss=0.00027, whisper_loss=0.09679, over 3866265.42 frames. ], batch size: 89, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:39:24,664 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 10:39:29,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=511870.0, ans=0.125 2024-08-10 10:39:39,444 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.498e+01 3.237e+01 3.581e+01 4.281e+01 8.585e+01, threshold=7.162e+01, percent-clipped=2.0 2024-08-10 10:39:54,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=512070.0, ans=0.0 2024-08-10 10:39:58,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=512070.0, ans=0.125 2024-08-10 10:40:00,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=512070.0, ans=0.0 2024-08-10 10:40:22,765 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7750, loss[loss=0.133, beats_loss=0.01068, ecapa_loss=0.0002665, whisper_loss=0.1196, over 22121.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01206, ecapa_loss=0.0002698, whisper_loss=0.09639, over 3893851.70 frames. ], batch size: 88, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:40:23,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=512270.0, ans=0.125 2024-08-10 10:40:26,322 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 10:40:34,417 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-10 10:41:00,323 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-10 10:41:07,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=512470.0, ans=0.2 2024-08-10 10:41:08,821 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.82 vs. limit=22.5 2024-08-10 10:41:09,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=512470.0, ans=0.0 2024-08-10 10:41:20,233 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 10:41:27,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=512670.0, ans=0.125 2024-08-10 10:41:45,628 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7800, loss[loss=0.1127, beats_loss=0.01239, ecapa_loss=0.0002394, whisper_loss=0.09791, over 17564.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01208, ecapa_loss=0.000267, whisper_loss=0.09678, over 3930115.06 frames. ], batch size: 67, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:42:14,552 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.36 vs. limit=15.0 2024-08-10 10:42:19,944 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-10 10:42:23,038 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 3.017e+01 3.377e+01 3.890e+01 5.572e+01, threshold=6.753e+01, percent-clipped=0.0 2024-08-10 10:42:34,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=513070.0, ans=0.1 2024-08-10 10:42:41,363 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 10:42:41,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=513070.0, ans=0.125 2024-08-10 10:42:44,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=513070.0, ans=0.125 2024-08-10 10:42:46,773 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.92 vs. limit=15.0 2024-08-10 10:43:03,914 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7850, loss[loss=0.1206, beats_loss=0.0107, ecapa_loss=0.0003039, whisper_loss=0.1069, over 19545.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.0121, ecapa_loss=0.0002664, whisper_loss=0.09698, over 3899392.40 frames. ], batch size: 78, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:43:21,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=513370.0, ans=0.125 2024-08-10 10:44:20,901 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 10:44:27,090 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7900, loss[loss=0.1283, beats_loss=0.01255, ecapa_loss=0.0002696, whisper_loss=0.113, over 23171.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01206, ecapa_loss=0.0002673, whisper_loss=0.09771, over 3913699.97 frames. ], batch size: 91, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:44:41,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=513770.0, ans=0.125 2024-08-10 10:44:45,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=513870.0, ans=0.125 2024-08-10 10:44:51,377 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.81 vs. limit=15.0 2024-08-10 10:45:05,803 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+01 2.988e+01 3.259e+01 3.767e+01 5.929e+01, threshold=6.519e+01, percent-clipped=0.0 2024-08-10 10:45:29,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=514070.0, ans=0.0 2024-08-10 10:45:35,935 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 14 from Vox, 50 fro AS 2024-08-10 10:45:41,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=514170.0, ans=0.0 2024-08-10 10:45:47,756 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 10:45:50,660 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 7950, loss[loss=0.1206, beats_loss=0.009683, ecapa_loss=0.0002071, whisper_loss=0.1088, over 17069.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01209, ecapa_loss=0.0002671, whisper_loss=0.09726, over 3886675.91 frames. ], batch size: 61, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:45:58,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=514270.0, ans=0.1 2024-08-10 10:46:05,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=514370.0, ans=0.0 2024-08-10 10:46:22,407 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 34 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 10:46:27,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=514470.0, ans=0.2 2024-08-10 10:46:28,445 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 10:46:30,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=514470.0, ans=0.07 2024-08-10 10:46:31,897 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 11 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 10:46:40,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=514570.0, ans=0.125 2024-08-10 10:46:49,722 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 10:46:52,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=514570.0, ans=0.125 2024-08-10 10:46:54,440 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 15 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 10:46:57,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=514670.0, ans=0.0 2024-08-10 10:47:12,095 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8000, loss[loss=0.09757, beats_loss=0.01404, ecapa_loss=0.0002059, whisper_loss=0.08147, over 21624.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.0121, ecapa_loss=0.0002652, whisper_loss=0.09678, over 3864199.04 frames. ], batch size: 87, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:47:19,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=514770.0, ans=0.125 2024-08-10 10:47:43,397 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 10:47:47,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=514970.0, ans=0.2 2024-08-10 10:47:49,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=514970.0, ans=0.0 2024-08-10 10:47:52,755 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 2.849e+01 3.134e+01 3.665e+01 7.663e+01, threshold=6.268e+01, percent-clipped=1.0 2024-08-10 10:48:07,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=515070.0, ans=0.125 2024-08-10 10:48:07,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=515070.0, ans=0.1 2024-08-10 10:48:07,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=515070.0, ans=0.0 2024-08-10 10:48:10,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=515070.0, ans=0.125 2024-08-10 10:48:33,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=515170.0, ans=0.1 2024-08-10 10:48:35,105 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 10:48:37,357 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.86 vs. limit=15.0 2024-08-10 10:48:40,304 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8050, loss[loss=0.1088, beats_loss=0.01267, ecapa_loss=0.0002923, whisper_loss=0.09326, over 21941.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01209, ecapa_loss=0.0002648, whisper_loss=0.09672, over 3872079.49 frames. ], batch size: 93, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:49:24,195 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 10:49:27,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=515370.0, ans=0.0 2024-08-10 10:49:35,587 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 10:49:51,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=515470.0, ans=0.125 2024-08-10 10:50:04,466 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 10:50:20,911 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-10 10:50:28,040 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 10:50:36,309 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 10:50:41,231 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8100, loss[loss=0.1023, beats_loss=0.01411, ecapa_loss=0.000239, whisper_loss=0.08579, over 18391.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01212, ecapa_loss=0.000267, whisper_loss=0.09633, over 3899304.82 frames. ], batch size: 72, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:51:17,009 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 10:51:19,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=515970.0, ans=0.125 2024-08-10 10:51:20,157 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 3.111e+01 3.674e+01 4.170e+01 5.858e+01, threshold=7.349e+01, percent-clipped=0.0 2024-08-10 10:51:26,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=515970.0, ans=0.2 2024-08-10 10:51:31,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=516070.0, ans=0.125 2024-08-10 10:51:36,825 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.12 vs. limit=15.0 2024-08-10 10:51:37,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=516070.0, ans=0.0 2024-08-10 10:51:39,035 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 10:51:50,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=516170.0, ans=0.125 2024-08-10 10:51:58,942 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 10:51:59,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=516170.0, ans=22.5 2024-08-10 10:52:03,387 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8150, loss[loss=0.114, beats_loss=0.01051, ecapa_loss=0.0002903, whisper_loss=0.1006, over 18455.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01215, ecapa_loss=0.0002677, whisper_loss=0.09579, over 3884744.80 frames. ], batch size: 72, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:52:03,542 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 10:52:07,279 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 10:52:17,848 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=8.38 vs. limit=8.0 2024-08-10 10:52:24,557 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 32 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 10:52:32,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=516370.0, ans=0.125 2024-08-10 10:52:34,059 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 10:52:39,090 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 10:52:46,928 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.54 vs. limit=15.0 2024-08-10 10:53:06,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=516670.0, ans=0.125 2024-08-10 10:53:08,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=516670.0, ans=0.125 2024-08-10 10:53:13,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=516670.0, ans=0.125 2024-08-10 10:53:14,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=516670.0, ans=0.05 2024-08-10 10:53:22,298 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 10:53:23,793 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8200, loss[loss=0.1203, beats_loss=0.01145, ecapa_loss=0.0002142, whisper_loss=0.1067, over 23632.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01213, ecapa_loss=0.000268, whisper_loss=0.09639, over 3892801.44 frames. ], batch size: 87, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:53:27,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=516770.0, ans=0.0 2024-08-10 10:53:32,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=516770.0, ans=0.0 2024-08-10 10:53:33,351 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 17 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 10:53:49,087 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 27 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-10 10:54:00,080 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.913e+01 3.375e+01 3.842e+01 5.271e+01, threshold=6.749e+01, percent-clipped=0.0 2024-08-10 10:54:02,333 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.59 vs. limit=22.5 2024-08-10 10:54:11,443 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 10:54:20,644 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-10 10:54:35,129 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 10:54:42,822 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8250, loss[loss=0.114, beats_loss=0.01181, ecapa_loss=0.0002935, whisper_loss=0.09924, over 21271.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01221, ecapa_loss=0.0002643, whisper_loss=0.09639, over 3908738.98 frames. ], batch size: 87, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:54:43,022 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 10:54:50,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=517270.0, ans=0.0 2024-08-10 10:54:52,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=517270.0, ans=0.125 2024-08-10 10:55:21,681 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 10:55:58,310 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-10 10:56:00,824 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8300, loss[loss=0.1144, beats_loss=0.0101, ecapa_loss=0.000335, whisper_loss=0.1009, over 20246.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01223, ecapa_loss=0.0002621, whisper_loss=0.09607, over 3892635.83 frames. ], batch size: 85, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 10:56:03,493 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.27 vs. limit=15.0 2024-08-10 10:56:10,126 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 10:56:22,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=517870.0, ans=0.05 2024-08-10 10:56:24,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=517870.0, ans=0.125 2024-08-10 10:56:27,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=517870.0, ans=0.2 2024-08-10 10:56:35,129 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 10:56:36,938 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.966e+01 3.242e+01 3.921e+01 6.642e+01, threshold=6.483e+01, percent-clipped=0.0 2024-08-10 10:57:12,652 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 20 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-10 10:57:24,690 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8350, loss[loss=0.1029, beats_loss=0.008542, ecapa_loss=0.0002835, whisper_loss=0.09154, over 15104.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01234, ecapa_loss=0.0002626, whisper_loss=0.09519, over 3901942.79 frames. ], batch size: 59, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 10:57:29,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=518270.0, ans=0.0 2024-08-10 10:57:30,888 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-10 10:57:45,513 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-10 10:57:53,069 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 10:57:57,925 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 10:58:06,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=518470.0, ans=0.125 2024-08-10 10:58:15,045 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2024-08-10 10:58:15,799 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 10:58:30,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=518570.0, ans=0.0 2024-08-10 10:58:32,464 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-10 10:58:38,772 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 10:58:43,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=518670.0, ans=0.0 2024-08-10 10:58:45,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=518670.0, ans=0.0 2024-08-10 10:58:48,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=518670.0, ans=0.0 2024-08-10 10:58:57,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=518770.0, ans=0.0 2024-08-10 10:58:59,212 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8400, loss[loss=0.1065, beats_loss=0.009585, ecapa_loss=0.0003225, whisper_loss=0.09368, over 16240.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01227, ecapa_loss=0.000267, whisper_loss=0.09572, over 3909396.47 frames. ], batch size: 65, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 10:59:37,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=518970.0, ans=0.125 2024-08-10 10:59:37,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=518970.0, ans=0.125 2024-08-10 10:59:40,891 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.265e+01 3.091e+01 3.394e+01 4.166e+01 8.578e+01, threshold=6.788e+01, percent-clipped=4.0 2024-08-10 10:59:41,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=518970.0, ans=0.125 2024-08-10 11:00:01,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=519070.0, ans=0.125 2024-08-10 11:00:25,540 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-10 11:00:27,195 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8450, loss[loss=0.1117, beats_loss=0.01383, ecapa_loss=0.0002076, whisper_loss=0.09579, over 22150.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01217, ecapa_loss=0.0002678, whisper_loss=0.09562, over 3885220.93 frames. ], batch size: 88, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 11:00:33,990 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=15.0 2024-08-10 11:00:53,952 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 11:01:14,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=519470.0, ans=0.0 2024-08-10 11:01:17,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=519470.0, ans=0.125 2024-08-10 11:01:17,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=519470.0, ans=0.0 2024-08-10 11:01:34,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=519570.0, ans=0.1 2024-08-10 11:01:54,774 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8500, loss[loss=0.1248, beats_loss=0.01045, ecapa_loss=0.0002435, whisper_loss=0.1119, over 14596.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01221, ecapa_loss=0.0002689, whisper_loss=0.09565, over 3898018.93 frames. ], batch size: 56, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 11:01:59,433 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-10 11:02:04,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=519770.0, ans=0.1 2024-08-10 11:02:24,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=519870.0, ans=0.125 2024-08-10 11:02:40,169 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.432e+01 3.171e+01 3.733e+01 4.165e+01 6.058e+01, threshold=7.466e+01, percent-clipped=0.0 2024-08-10 11:02:40,953 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.92 vs. limit=12.0 2024-08-10 11:03:02,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=520070.0, ans=0.125 2024-08-10 11:03:08,978 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 11:03:15,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=520170.0, ans=0.2 2024-08-10 11:03:24,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=520170.0, ans=0.125 2024-08-10 11:03:26,520 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8550, loss[loss=0.1243, beats_loss=0.01117, ecapa_loss=0.0002084, whisper_loss=0.111, over 22170.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01218, ecapa_loss=0.0002671, whisper_loss=0.09589, over 3916466.16 frames. ], batch size: 85, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:03:31,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=520270.0, ans=0.2 2024-08-10 11:03:33,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=520270.0, ans=0.0 2024-08-10 11:03:42,124 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 24 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-10 11:03:50,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=520370.0, ans=0.04949747468305833 2024-08-10 11:04:02,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=520470.0, ans=0.2 2024-08-10 11:04:04,837 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.10 vs. limit=10.0 2024-08-10 11:04:16,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=520470.0, ans=0.0 2024-08-10 11:04:46,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=520670.0, ans=0.125 2024-08-10 11:04:57,107 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8600, loss[loss=0.1072, beats_loss=0.01171, ecapa_loss=0.0002212, whisper_loss=0.09331, over 13734.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01207, ecapa_loss=0.0002658, whisper_loss=0.09657, over 3895925.54 frames. ], batch size: 53, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:05:31,692 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 11:05:33,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=520970.0, ans=0.035 2024-08-10 11:05:36,342 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.414e+01 3.011e+01 3.429e+01 3.879e+01 6.555e+01, threshold=6.857e+01, percent-clipped=0.0 2024-08-10 11:05:40,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=520970.0, ans=0.125 2024-08-10 11:05:47,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=521070.0, ans=0.125 2024-08-10 11:06:17,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=521170.0, ans=0.0 2024-08-10 11:06:20,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=521170.0, ans=0.125 2024-08-10 11:06:27,293 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8650, loss[loss=0.07187, beats_loss=0.01483, ecapa_loss=0.0002573, whisper_loss=0.05446, over 17005.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01215, ecapa_loss=0.0002638, whisper_loss=0.0963, over 3892826.55 frames. ], batch size: 71, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:06:27,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=521270.0, ans=0.0 2024-08-10 11:06:35,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.35 vs. limit=22.5 2024-08-10 11:06:36,596 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.89 vs. limit=15.0 2024-08-10 11:06:45,478 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.88 vs. limit=22.5 2024-08-10 11:06:56,298 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-10 11:07:00,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=521370.0, ans=0.125 2024-08-10 11:07:13,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.68 vs. limit=10.0 2024-08-10 11:07:36,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.43 vs. limit=12.0 2024-08-10 11:07:44,257 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 18 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-10 11:07:48,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=521670.0, ans=0.035 2024-08-10 11:07:50,954 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.72 vs. limit=10.0 2024-08-10 11:07:51,626 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 11:07:57,421 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8700, loss[loss=0.1173, beats_loss=0.01105, ecapa_loss=0.0002555, whisper_loss=0.1037, over 16462.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01219, ecapa_loss=0.0002648, whisper_loss=0.09626, over 3896924.81 frames. ], batch size: 65, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:08:24,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=521870.0, ans=0.0 2024-08-10 11:08:28,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=521870.0, ans=0.0 2024-08-10 11:08:37,300 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 2.909e+01 3.289e+01 3.792e+01 9.063e+01, threshold=6.579e+01, percent-clipped=1.0 2024-08-10 11:08:52,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=522070.0, ans=0.1 2024-08-10 11:08:54,297 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2024-08-10 11:09:01,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=522070.0, ans=0.0 2024-08-10 11:09:02,744 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 16 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 11:09:12,871 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.29 vs. limit=10.0 2024-08-10 11:09:25,662 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8750, loss[loss=0.1246, beats_loss=0.01075, ecapa_loss=0.0003016, whisper_loss=0.1108, over 19984.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01219, ecapa_loss=0.0002646, whisper_loss=0.09662, over 3897928.63 frames. ], batch size: 81, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:09:25,774 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 11:09:31,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=522270.0, ans=0.2 2024-08-10 11:09:43,525 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-10 11:09:45,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=522370.0, ans=0.0 2024-08-10 11:09:58,419 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 11:10:02,560 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 11:10:08,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=522470.0, ans=0.125 2024-08-10 11:10:30,823 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-10 11:10:34,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=522670.0, ans=0.0 2024-08-10 11:10:52,246 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8800, loss[loss=0.1069, beats_loss=0.01125, ecapa_loss=0.0002512, whisper_loss=0.09309, over 13581.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01223, ecapa_loss=0.0002641, whisper_loss=0.09601, over 3904268.97 frames. ], batch size: 53, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:10:58,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=522770.0, ans=0.0 2024-08-10 11:11:27,323 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=12.0 2024-08-10 11:11:27,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=22.5 2024-08-10 11:11:31,477 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.05 vs. limit=15.0 2024-08-10 11:11:32,289 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 3.151e+01 3.444e+01 3.946e+01 7.427e+01, threshold=6.887e+01, percent-clipped=2.0 2024-08-10 11:11:40,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=522970.0, ans=0.125 2024-08-10 11:11:53,434 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 11:12:07,497 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.17 vs. limit=15.0 2024-08-10 11:12:13,309 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-10 11:12:13,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=523170.0, ans=0.0 2024-08-10 11:12:14,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=523170.0, ans=0.0 2024-08-10 11:12:21,328 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8850, loss[loss=0.1291, beats_loss=0.006964, ecapa_loss=0.000352, whisper_loss=0.1186, over 16870.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01219, ecapa_loss=0.0002648, whisper_loss=0.09576, over 3902855.85 frames. ], batch size: 67, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:12:23,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=523270.0, ans=0.1 2024-08-10 11:12:27,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=523270.0, ans=0.125 2024-08-10 11:12:37,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=523370.0, ans=0.2 2024-08-10 11:12:45,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=523370.0, ans=0.2 2024-08-10 11:12:55,609 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-10 11:13:01,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=523470.0, ans=0.1 2024-08-10 11:13:11,728 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.35 vs. limit=12.0 2024-08-10 11:13:17,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=523570.0, ans=0.125 2024-08-10 11:13:51,197 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8900, loss[loss=0.0999, beats_loss=0.01389, ecapa_loss=0.0002036, whisper_loss=0.08397, over 21689.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01225, ecapa_loss=0.0002638, whisper_loss=0.09506, over 3886750.86 frames. ], batch size: 86, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:13:53,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=523770.0, ans=0.125 2024-08-10 11:14:10,440 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 11:14:28,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=523970.0, ans=0.07 2024-08-10 11:14:29,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=523970.0, ans=0.125 2024-08-10 11:14:35,415 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+01 2.997e+01 3.258e+01 3.778e+01 5.539e+01, threshold=6.517e+01, percent-clipped=0.0 2024-08-10 11:14:37,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=523970.0, ans=0.0 2024-08-10 11:14:39,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=523970.0, ans=0.0 2024-08-10 11:14:49,874 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 11:15:06,610 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.68 vs. limit=15.0 2024-08-10 11:15:07,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=524170.0, ans=0.0 2024-08-10 11:15:10,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=524170.0, ans=0.0 2024-08-10 11:15:16,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=524170.0, ans=0.125 2024-08-10 11:15:17,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=524170.0, ans=0.2 2024-08-10 11:15:17,808 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.69 vs. limit=22.5 2024-08-10 11:15:22,557 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 8950, loss[loss=0.1057, beats_loss=0.0134, ecapa_loss=0.0002278, whisper_loss=0.08998, over 18132.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01219, ecapa_loss=0.0002647, whisper_loss=0.09471, over 3865206.17 frames. ], batch size: 71, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:15:33,272 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 26 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-10 11:15:36,804 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-10 11:15:42,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=524370.0, ans=0.125 2024-08-10 11:15:43,820 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-10 11:15:55,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=524470.0, ans=0.125 2024-08-10 11:16:05,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=524470.0, ans=0.125 2024-08-10 11:16:49,801 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9000, loss[loss=0.1342, beats_loss=0.01039, ecapa_loss=0.0002591, whisper_loss=0.1212, over 18000.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01212, ecapa_loss=0.0002645, whisper_loss=0.09564, over 3878855.48 frames. ], batch size: 69, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:16:49,801 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-10 11:17:36,011 INFO [train_multi_KD3.py:1149] (2/4) Epoch 4, validation on ASR_libri: loss=0.2658, beats_loss=0, ecapa_loss=0.000793, whisper_loss=0.2579, over 922467.00 frames. 2024-08-10 11:17:45,895 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2550, 4.0275, 3.8816, 3.4372], device='cuda:2') 2024-08-10 11:17:54,653 INFO [train_multi_KD3.py:1149] (2/4) Epoch 4, validation on SV_voxceleb1: loss=0.007025, beats_loss=0, ecapa_loss=0.0007025, whisper_loss=0, over 939242.00 frames. 2024-08-10 11:19:54,491 INFO [train_multi_KD3.py:1149] (2/4) Epoch 4, validation on AT_audioset: loss=0.02753, beats_loss=0.02753, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 11:19:54,495 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 11:20:00,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=524770.0, ans=0.125 2024-08-10 11:20:15,339 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 29 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 11:20:24,257 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 16 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 11:20:25,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=524970.0, ans=0.125 2024-08-10 11:20:27,976 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 11:20:33,303 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.326e+01 3.014e+01 3.320e+01 3.675e+01 5.799e+01, threshold=6.641e+01, percent-clipped=0.0 2024-08-10 11:20:43,059 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.35 vs. limit=22.5 2024-08-10 11:20:51,711 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-10 11:21:05,281 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 11:21:05,842 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.62 vs. limit=15.0 2024-08-10 11:21:19,417 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9050, loss[loss=0.0993, beats_loss=0.01274, ecapa_loss=0.0002774, whisper_loss=0.08378, over 20494.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01209, ecapa_loss=0.0002639, whisper_loss=0.09608, over 3881564.24 frames. ], batch size: 87, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:21:22,404 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2024-08-10 11:21:28,928 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 11:21:56,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=525470.0, ans=0.0 2024-08-10 11:22:00,436 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-10 11:22:03,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=525470.0, ans=0.1 2024-08-10 11:22:07,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=525470.0, ans=0.0 2024-08-10 11:22:24,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=525570.0, ans=0.125 2024-08-10 11:22:24,983 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.02 vs. limit=22.5 2024-08-10 11:22:34,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=525670.0, ans=0.2 2024-08-10 11:22:43,316 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9100, loss[loss=0.1091, beats_loss=0.01456, ecapa_loss=0.0002344, whisper_loss=0.09218, over 22428.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.012, ecapa_loss=0.0002661, whisper_loss=0.09662, over 3859610.41 frames. ], batch size: 88, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:22:43,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=525770.0, ans=0.125 2024-08-10 11:22:48,540 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 11:22:51,395 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-10 11:23:01,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=525870.0, ans=0.0 2024-08-10 11:23:06,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=525870.0, ans=0.05 2024-08-10 11:23:20,352 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.280e+01 2.970e+01 3.325e+01 3.905e+01 6.354e+01, threshold=6.649e+01, percent-clipped=0.0 2024-08-10 11:23:25,628 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:23:25,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=525970.0, ans=0.125 2024-08-10 11:23:30,392 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-10 11:23:41,572 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 38 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 11:23:55,580 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2024-08-10 11:24:03,292 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9150, loss[loss=0.1156, beats_loss=0.01329, ecapa_loss=0.000304, whisper_loss=0.09929, over 21693.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01209, ecapa_loss=0.0002633, whisper_loss=0.09621, over 3875712.15 frames. ], batch size: 93, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:24:25,899 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 16 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-10 11:24:31,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=526370.0, ans=15.0 2024-08-10 11:24:39,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=526470.0, ans=0.0 2024-08-10 11:24:44,022 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.48 vs. limit=10.0 2024-08-10 11:24:51,432 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.16 vs. limit=10.0 2024-08-10 11:24:55,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=526570.0, ans=0.125 2024-08-10 11:24:59,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=526570.0, ans=0.125 2024-08-10 11:25:01,320 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.27 vs. limit=12.0 2024-08-10 11:25:03,209 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 18 from LS+wenet, 25 from Vox, 52 fro AS 2024-08-10 11:25:06,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=526670.0, ans=0.125 2024-08-10 11:25:18,171 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9200, loss[loss=0.1162, beats_loss=0.009994, ecapa_loss=0.0003071, whisper_loss=0.1031, over 14679.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01222, ecapa_loss=0.0002627, whisper_loss=0.09518, over 3891933.08 frames. ], batch size: 59, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:25:19,782 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 11:25:20,502 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2024-08-10 11:25:22,101 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.92 vs. limit=15.0 2024-08-10 11:25:25,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=526770.0, ans=0.0 2024-08-10 11:25:26,416 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 11:25:29,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=526770.0, ans=0.125 2024-08-10 11:25:30,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=526870.0, ans=0.1 2024-08-10 11:25:33,050 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.00 vs. limit=22.5 2024-08-10 11:25:37,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=526870.0, ans=0.125 2024-08-10 11:25:49,032 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 2.987e+01 3.332e+01 3.744e+01 5.839e+01, threshold=6.663e+01, percent-clipped=0.0 2024-08-10 11:25:49,299 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 11:25:54,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=526970.0, ans=0.125 2024-08-10 11:25:56,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=526970.0, ans=0.125 2024-08-10 11:25:58,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=527070.0, ans=0.0 2024-08-10 11:26:10,654 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 41 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 11:26:19,906 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 11:26:22,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=527170.0, ans=0.05 2024-08-10 11:26:24,918 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9250, loss[loss=0.1143, beats_loss=0.01192, ecapa_loss=0.0002379, whisper_loss=0.09997, over 17131.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01206, ecapa_loss=0.0002669, whisper_loss=0.09592, over 3905047.01 frames. ], batch size: 67, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:26:32,691 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 31 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-10 11:26:32,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=527270.0, ans=0.125 2024-08-10 11:26:35,618 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 11 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 11:26:50,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=527470.0, ans=0.0 2024-08-10 11:26:57,988 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2024-08-10 11:27:00,037 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 11:27:04,014 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 11:27:12,456 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 11:27:17,445 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 11:27:30,347 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9300, loss[loss=0.1036, beats_loss=0.01432, ecapa_loss=0.0002394, whisper_loss=0.08686, over 22108.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01205, ecapa_loss=0.0002654, whisper_loss=0.09662, over 3916497.57 frames. ], batch size: 92, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:27:33,770 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 11:27:36,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.64 vs. limit=15.0 2024-08-10 11:27:54,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=527870.0, ans=0.1 2024-08-10 11:28:00,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=527970.0, ans=0.2 2024-08-10 11:28:02,806 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.256e+01 2.982e+01 3.468e+01 4.140e+01 6.249e+01, threshold=6.936e+01, percent-clipped=0.0 2024-08-10 11:28:08,968 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 11:28:16,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=528070.0, ans=0.125 2024-08-10 11:28:24,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=528070.0, ans=0.0 2024-08-10 11:28:26,940 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 11:28:41,597 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9350, loss[loss=0.1037, beats_loss=0.01129, ecapa_loss=0.0002743, whisper_loss=0.08962, over 20350.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01204, ecapa_loss=0.0002658, whisper_loss=0.09651, over 3888642.08 frames. ], batch size: 80, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:28:47,683 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=15.0 2024-08-10 11:28:50,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=528270.0, ans=0.1 2024-08-10 11:29:01,319 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-10 11:29:02,597 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 11:29:05,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=528370.0, ans=0.125 2024-08-10 11:29:14,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=528470.0, ans=0.0 2024-08-10 11:29:41,192 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 11:29:45,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=528670.0, ans=0.0 2024-08-10 11:29:47,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=528670.0, ans=0.125 2024-08-10 11:29:50,105 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-10 11:29:53,605 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9400, loss[loss=0.1181, beats_loss=0.008992, ecapa_loss=0.0002489, whisper_loss=0.1067, over 15254.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01211, ecapa_loss=0.0002659, whisper_loss=0.09581, over 3857249.83 frames. ], batch size: 58, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:30:30,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=528970.0, ans=0.125 2024-08-10 11:30:31,888 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.405e+01 3.113e+01 3.432e+01 4.042e+01 8.997e+01, threshold=6.863e+01, percent-clipped=2.0 2024-08-10 11:30:32,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=528970.0, ans=0.125 2024-08-10 11:31:05,166 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 11:31:10,408 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9450, loss[loss=0.1019, beats_loss=0.01086, ecapa_loss=0.0002534, whisper_loss=0.08849, over 14849.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01205, ecapa_loss=0.0002666, whisper_loss=0.09561, over 3841998.91 frames. ], batch size: 57, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:31:19,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=529270.0, ans=0.125 2024-08-10 11:31:44,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=529470.0, ans=0.1 2024-08-10 11:32:08,278 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 11:32:09,268 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.93 vs. limit=15.0 2024-08-10 11:32:14,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=529670.0, ans=0.2 2024-08-10 11:32:27,195 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9500, loss[loss=0.1039, beats_loss=0.01329, ecapa_loss=0.0002535, whisper_loss=0.08803, over 14953.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01208, ecapa_loss=0.0002668, whisper_loss=0.0953, over 3880248.06 frames. ], batch size: 57, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:32:29,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=529770.0, ans=0.2 2024-08-10 11:32:40,682 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 11:32:42,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=529870.0, ans=0.125 2024-08-10 11:32:44,810 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 11:32:55,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=529970.0, ans=0.025 2024-08-10 11:33:00,303 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+01 2.898e+01 3.217e+01 3.700e+01 5.976e+01, threshold=6.434e+01, percent-clipped=0.0 2024-08-10 11:33:12,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=530070.0, ans=0.0 2024-08-10 11:33:14,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=530070.0, ans=0.0 2024-08-10 11:33:34,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=530170.0, ans=0.2 2024-08-10 11:33:35,646 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 11:33:38,065 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9550, loss[loss=0.103, beats_loss=0.01239, ecapa_loss=0.0002308, whisper_loss=0.08834, over 19436.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01209, ecapa_loss=0.0002658, whisper_loss=0.09443, over 3844533.62 frames. ], batch size: 75, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:33:39,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=530270.0, ans=0.04949747468305833 2024-08-10 11:33:49,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=530270.0, ans=0.1 2024-08-10 11:33:57,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=530370.0, ans=0.0 2024-08-10 11:34:10,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=530470.0, ans=0.2 2024-08-10 11:34:39,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=530670.0, ans=0.0 2024-08-10 11:34:45,687 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9600, loss[loss=0.1032, beats_loss=0.01047, ecapa_loss=0.0003303, whisper_loss=0.08943, over 21856.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.0121, ecapa_loss=0.0002663, whisper_loss=0.09445, over 3841013.58 frames. ], batch size: 91, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:34:49,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=530770.0, ans=0.07 2024-08-10 11:34:55,221 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-10 11:35:02,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=530870.0, ans=0.09899494936611666 2024-08-10 11:35:16,483 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.066e+01 2.859e+01 3.374e+01 4.050e+01 6.854e+01, threshold=6.749e+01, percent-clipped=1.0 2024-08-10 11:35:20,519 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-10 11:35:35,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2024-08-10 11:35:45,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=531170.0, ans=0.0 2024-08-10 11:35:51,451 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9650, loss[loss=0.1175, beats_loss=0.01335, ecapa_loss=0.0002993, whisper_loss=0.1011, over 21884.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01204, ecapa_loss=0.0002669, whisper_loss=0.09426, over 3823566.21 frames. ], batch size: 92, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:36:05,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=531370.0, ans=0.2 2024-08-10 11:36:09,115 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 11:36:17,062 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-10 11:36:20,833 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 28 from Vox, 21 fro AS 2024-08-10 11:36:40,791 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-08-10 11:36:53,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=531770.0, ans=0.05 2024-08-10 11:36:55,009 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9700, loss[loss=0.1068, beats_loss=0.01128, ecapa_loss=0.0002795, whisper_loss=0.09275, over 22992.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01212, ecapa_loss=0.0002661, whisper_loss=0.09473, over 3846303.54 frames. ], batch size: 95, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:37:00,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=531770.0, ans=0.0 2024-08-10 11:37:01,791 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 11:37:03,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=531770.0, ans=0.125 2024-08-10 11:37:04,475 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:37:24,751 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.979e+01 3.402e+01 3.794e+01 6.549e+01, threshold=6.804e+01, percent-clipped=0.0 2024-08-10 11:37:35,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=532070.0, ans=0.0 2024-08-10 11:38:00,167 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9750, loss[loss=0.128, beats_loss=0.01151, ecapa_loss=0.0002594, whisper_loss=0.1139, over 20337.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01214, ecapa_loss=0.0002646, whisper_loss=0.09477, over 3856466.59 frames. ], batch size: 77, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:38:05,587 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 44 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 11:38:17,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=532370.0, ans=0.0 2024-08-10 11:38:18,587 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 11:38:24,521 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 11:38:30,801 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-10 11:38:34,022 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.26 vs. limit=22.5 2024-08-10 11:39:06,577 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9800, loss[loss=0.1015, beats_loss=0.009932, ecapa_loss=0.000357, whisper_loss=0.08797, over 14757.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01207, ecapa_loss=0.0002666, whisper_loss=0.09465, over 3843628.73 frames. ], batch size: 63, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:39:09,230 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 27 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-10 11:39:11,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=532770.0, ans=0.09899494936611666 2024-08-10 11:39:18,235 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 11:39:36,043 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 3.053e+01 3.396e+01 3.815e+01 6.772e+01, threshold=6.792e+01, percent-clipped=0.0 2024-08-10 11:40:01,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=533170.0, ans=0.125 2024-08-10 11:40:11,254 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9850, loss[loss=0.1082, beats_loss=0.01084, ecapa_loss=0.000199, whisper_loss=0.09535, over 15898.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01201, ecapa_loss=0.0002652, whisper_loss=0.09591, over 3851639.52 frames. ], batch size: 58, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:40:11,473 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 11:40:14,563 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.57 vs. limit=12.0 2024-08-10 11:40:20,118 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.33 vs. limit=15.0 2024-08-10 11:40:42,373 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 11:40:44,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=533470.0, ans=0.1 2024-08-10 11:40:45,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=533470.0, ans=0.125 2024-08-10 11:40:46,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=533470.0, ans=0.125 2024-08-10 11:40:49,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=533570.0, ans=0.125 2024-08-10 11:40:57,469 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 11 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 11:40:57,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=533570.0, ans=0.0 2024-08-10 11:40:58,996 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:41:13,174 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:41:15,631 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9900, loss[loss=0.1049, beats_loss=0.01133, ecapa_loss=0.0002521, whisper_loss=0.09109, over 14821.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01199, ecapa_loss=0.0002624, whisper_loss=0.0962, over 3869546.53 frames. ], batch size: 59, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:41:20,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=533770.0, ans=0.2 2024-08-10 11:41:26,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=533770.0, ans=0.125 2024-08-10 11:41:35,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=533870.0, ans=0.125 2024-08-10 11:41:44,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=533970.0, ans=0.125 2024-08-10 11:41:44,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=533970.0, ans=0.1 2024-08-10 11:41:45,713 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.261e+01 2.867e+01 3.339e+01 3.780e+01 5.864e+01, threshold=6.678e+01, percent-clipped=0.0 2024-08-10 11:41:45,928 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 11:41:51,172 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 14 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 11:41:57,773 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 11:41:59,981 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.02 vs. limit=15.0 2024-08-10 11:42:03,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=534070.0, ans=0.0 2024-08-10 11:42:04,336 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:42:06,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=534170.0, ans=0.125 2024-08-10 11:42:14,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=534170.0, ans=0.1 2024-08-10 11:42:20,687 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 9950, loss[loss=0.125, beats_loss=0.01006, ecapa_loss=0.0003221, whisper_loss=0.1118, over 22498.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01199, ecapa_loss=0.0002631, whisper_loss=0.09577, over 3853040.72 frames. ], batch size: 93, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:42:21,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=534270.0, ans=0.0 2024-08-10 11:42:37,151 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.31 vs. limit=10.0 2024-08-10 11:42:37,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=534370.0, ans=0.125 2024-08-10 11:42:47,835 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 11:42:49,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=534470.0, ans=0.1 2024-08-10 11:43:08,611 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 19 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-10 11:43:20,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=534670.0, ans=0.1 2024-08-10 11:43:25,437 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10000, loss[loss=0.07851, beats_loss=0.01343, ecapa_loss=0.0001926, whisper_loss=0.06315, over 15673.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01203, ecapa_loss=0.000264, whisper_loss=0.0957, over 3847301.42 frames. ], batch size: 60, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:43:27,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=534770.0, ans=0.1 2024-08-10 11:43:32,452 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2024-08-10 11:43:35,953 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 11:43:41,947 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-08-10 11:43:52,907 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 35 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 11:43:53,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=534970.0, ans=0.0 2024-08-10 11:43:55,610 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 2.932e+01 3.270e+01 3.845e+01 5.958e+01, threshold=6.541e+01, percent-clipped=0.0 2024-08-10 11:43:55,831 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 11:43:56,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=534970.0, ans=0.1 2024-08-10 11:44:11,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=535070.0, ans=0.125 2024-08-10 11:44:20,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=535170.0, ans=0.1 2024-08-10 11:44:24,238 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 25 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-10 11:44:29,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=535270.0, ans=0.125 2024-08-10 11:44:30,247 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10050, loss[loss=0.127, beats_loss=0.007403, ecapa_loss=0.0003099, whisper_loss=0.1165, over 16801.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01193, ecapa_loss=0.0002631, whisper_loss=0.09611, over 3834629.39 frames. ], batch size: 63, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:45:19,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=535570.0, ans=0.0 2024-08-10 11:45:25,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=535670.0, ans=0.125 2024-08-10 11:45:29,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=535670.0, ans=0.125 2024-08-10 11:45:35,742 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10100, loss[loss=0.1099, beats_loss=0.01027, ecapa_loss=0.0003254, whisper_loss=0.09637, over 18645.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01187, ecapa_loss=0.0002656, whisper_loss=0.09674, over 3837539.93 frames. ], batch size: 76, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:45:44,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=535770.0, ans=0.125 2024-08-10 11:46:05,959 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 3.067e+01 3.527e+01 4.291e+01 1.159e+02, threshold=7.053e+01, percent-clipped=2.0 2024-08-10 11:46:06,598 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-08-10 11:46:10,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=535970.0, ans=0.125 2024-08-10 11:46:12,351 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 18 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 11:46:19,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=536070.0, ans=0.0 2024-08-10 11:46:31,589 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 11:46:31,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=536170.0, ans=0.0 2024-08-10 11:46:32,922 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 11:46:40,443 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10150, loss[loss=0.1238, beats_loss=0.01228, ecapa_loss=0.0002247, whisper_loss=0.1092, over 18264.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.0119, ecapa_loss=0.0002663, whisper_loss=0.09635, over 3836782.19 frames. ], batch size: 69, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:46:40,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=536270.0, ans=0.125 2024-08-10 11:46:43,427 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 11:46:44,785 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-10 11:46:46,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=536270.0, ans=0.0 2024-08-10 11:46:47,487 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-10 11:47:05,279 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 11:47:18,182 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-10 11:47:30,740 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-10 11:47:34,656 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=22.5 2024-08-10 11:47:39,445 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 11:47:56,939 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2024-08-10 11:47:57,628 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10200, loss[loss=0.0932, beats_loss=0.01234, ecapa_loss=0.0002736, whisper_loss=0.07812, over 22673.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01193, ecapa_loss=0.0002633, whisper_loss=0.09565, over 3827781.02 frames. ], batch size: 94, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:48:22,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=536870.0, ans=0.2 2024-08-10 11:48:22,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=536870.0, ans=0.0 2024-08-10 11:48:34,799 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.383e+01 3.065e+01 3.411e+01 3.914e+01 6.071e+01, threshold=6.821e+01, percent-clipped=0.0 2024-08-10 11:49:07,163 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 11:49:09,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=537170.0, ans=0.2 2024-08-10 11:49:14,520 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=12.0 2024-08-10 11:49:15,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=537170.0, ans=0.1 2024-08-10 11:49:15,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=537170.0, ans=0.125 2024-08-10 11:49:19,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=537270.0, ans=0.125 2024-08-10 11:49:20,112 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10250, loss[loss=0.1387, beats_loss=0.01026, ecapa_loss=0.0002509, whisper_loss=0.126, over 22850.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.012, ecapa_loss=0.0002627, whisper_loss=0.09601, over 3871593.59 frames. ], batch size: 88, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:49:22,172 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 11:49:53,721 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.66 vs. limit=15.0 2024-08-10 11:50:21,707 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:50:21,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=537570.0, ans=0.125 2024-08-10 11:50:35,580 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 19 from LS+wenet, 26 from Vox, 48 fro AS 2024-08-10 11:50:44,663 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10300, loss[loss=0.1156, beats_loss=0.01102, ecapa_loss=0.0002746, whisper_loss=0.1018, over 16511.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01211, ecapa_loss=0.0002621, whisper_loss=0.09567, over 3887451.80 frames. ], batch size: 66, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:50:49,590 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 16 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 11:50:51,088 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 11:51:12,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=537870.0, ans=0.125 2024-08-10 11:51:20,610 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 3.115e+01 3.523e+01 4.089e+01 1.199e+02, threshold=7.045e+01, percent-clipped=1.0 2024-08-10 11:51:21,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=537970.0, ans=0.0 2024-08-10 11:51:24,342 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 11:51:32,151 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-10 11:51:49,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=538170.0, ans=0.125 2024-08-10 11:51:50,894 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 11:51:54,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=538170.0, ans=0.1 2024-08-10 11:52:04,295 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10350, loss[loss=0.09326, beats_loss=0.01356, ecapa_loss=0.0002789, whisper_loss=0.07691, over 20399.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01222, ecapa_loss=0.0002612, whisper_loss=0.09546, over 3900962.97 frames. ], batch size: 84, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:52:24,041 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.63 vs. limit=15.0 2024-08-10 11:52:36,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=538470.0, ans=0.1 2024-08-10 11:52:38,411 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.273e-02 2024-08-10 11:52:58,290 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 13 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 11:53:01,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=538570.0, ans=0.1 2024-08-10 11:53:05,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=538570.0, ans=15.0 2024-08-10 11:53:25,068 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10400, loss[loss=0.115, beats_loss=0.01209, ecapa_loss=0.0002777, whisper_loss=0.1002, over 17963.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01212, ecapa_loss=0.0002603, whisper_loss=0.09651, over 3901768.33 frames. ], batch size: 78, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:53:32,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=538770.0, ans=0.1 2024-08-10 11:53:42,544 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 11:54:01,479 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.334e+01 2.892e+01 3.209e+01 3.631e+01 5.476e+01, threshold=6.418e+01, percent-clipped=0.0 2024-08-10 11:54:25,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=539070.0, ans=0.125 2024-08-10 11:54:28,968 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-10 11:54:35,340 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 11:54:44,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10450, loss[loss=0.1125, beats_loss=0.01147, ecapa_loss=0.0002438, whisper_loss=0.09863, over 15960.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01212, ecapa_loss=0.0002608, whisper_loss=0.09591, over 3875721.53 frames. ], batch size: 58, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:54:51,864 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 11:54:52,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=539270.0, ans=0.125 2024-08-10 11:54:57,048 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 11:54:58,993 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.89 vs. limit=22.5 2024-08-10 11:55:23,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=539470.0, ans=0.1 2024-08-10 11:55:30,438 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.99 vs. limit=22.5 2024-08-10 11:55:36,349 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2024-08-10 11:55:37,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=539570.0, ans=0.125 2024-08-10 11:55:38,733 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 11:55:42,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=539570.0, ans=0.125 2024-08-10 11:55:55,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=539670.0, ans=0.0 2024-08-10 11:55:56,892 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 11:55:58,033 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 14 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-10 11:56:01,623 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10500, loss[loss=0.1248, beats_loss=0.008686, ecapa_loss=0.0002985, whisper_loss=0.1131, over 22739.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01209, ecapa_loss=0.0002618, whisper_loss=0.09607, over 3891834.57 frames. ], batch size: 93, lr: 1.45e-02, grad_scale: 2147483648.0 2024-08-10 11:56:01,874 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 11:56:17,110 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2024-08-10 11:56:29,339 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 17 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 11:56:29,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=539970.0, ans=0.0 2024-08-10 11:56:33,653 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 3.038e+01 3.459e+01 3.996e+01 6.342e+01, threshold=6.919e+01, percent-clipped=0.0 2024-08-10 11:56:35,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=539970.0, ans=0.125 2024-08-10 11:56:46,983 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 27 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-10 11:56:58,040 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-10 11:57:08,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=540270.0, ans=0.0 2024-08-10 11:57:09,154 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10550, loss[loss=0.1149, beats_loss=0.0107, ecapa_loss=0.0002359, whisper_loss=0.1019, over 22144.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.0121, ecapa_loss=0.0002608, whisper_loss=0.09578, over 3888296.39 frames. ], batch size: 85, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 11:57:09,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=540270.0, ans=0.09899494936611666 2024-08-10 11:57:16,123 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 11:57:20,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=540270.0, ans=0.125 2024-08-10 11:57:26,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=540370.0, ans=10.0 2024-08-10 11:57:34,296 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2024-08-10 11:57:46,877 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.52 vs. limit=12.0 2024-08-10 11:57:53,311 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 31 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 11:57:54,558 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 11:58:05,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=540670.0, ans=0.0 2024-08-10 11:58:10,994 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-08-10 11:58:17,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=540770.0, ans=0.0 2024-08-10 11:58:18,500 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10600, loss[loss=0.1041, beats_loss=0.01243, ecapa_loss=0.0002549, whisper_loss=0.0891, over 21992.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01212, ecapa_loss=0.0002608, whisper_loss=0.09527, over 3892453.56 frames. ], batch size: 89, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 11:58:23,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=540770.0, ans=0.125 2024-08-10 11:58:24,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=540770.0, ans=0.125 2024-08-10 11:58:49,630 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 2.966e+01 3.321e+01 3.773e+01 6.212e+01, threshold=6.641e+01, percent-clipped=0.0 2024-08-10 11:58:49,840 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 11:58:50,963 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 11:59:07,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=541070.0, ans=0.125 2024-08-10 11:59:13,626 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-10 11:59:17,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=541170.0, ans=0.2 2024-08-10 11:59:25,363 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10650, loss[loss=0.1134, beats_loss=0.01256, ecapa_loss=0.0002197, whisper_loss=0.09869, over 23764.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.012, ecapa_loss=0.0002599, whisper_loss=0.09587, over 3870284.67 frames. ], batch size: 88, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 11:59:25,482 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 11:59:28,609 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2024-08-10 11:59:29,709 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.312e+00 2024-08-10 11:59:31,899 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-10 11:59:33,265 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 11:59:55,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=541470.0, ans=0.125 2024-08-10 11:59:59,066 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:59:59,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=541470.0, ans=0.2 2024-08-10 12:00:19,304 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 12:00:19,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=541670.0, ans=0.125 2024-08-10 12:00:31,364 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10700, loss[loss=0.1335, beats_loss=0.008563, ecapa_loss=0.0002516, whisper_loss=0.1224, over 16190.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01196, ecapa_loss=0.0002602, whisper_loss=0.09644, over 3881919.70 frames. ], batch size: 57, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:00:31,562 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 12 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 12:00:34,166 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 14 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 12:00:35,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=541770.0, ans=0.04949747468305833 2024-08-10 12:00:41,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=541770.0, ans=0.125 2024-08-10 12:00:48,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541870.0, ans=0.1 2024-08-10 12:00:55,937 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 15 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 12:01:02,749 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.02 vs. limit=15.0 2024-08-10 12:01:03,136 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.479e+01 3.156e+01 3.555e+01 4.088e+01 6.627e+01, threshold=7.109e+01, percent-clipped=0.0 2024-08-10 12:01:11,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=542070.0, ans=0.0 2024-08-10 12:01:12,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=542070.0, ans=0.0 2024-08-10 12:01:15,685 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-10 12:01:39,364 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10750, loss[loss=0.1292, beats_loss=0.01322, ecapa_loss=0.0002399, whisper_loss=0.1136, over 20993.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01204, ecapa_loss=0.0002582, whisper_loss=0.09697, over 3898732.37 frames. ], batch size: 81, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:01:46,440 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 12:01:47,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=542270.0, ans=0.125 2024-08-10 12:01:55,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=542370.0, ans=0.125 2024-08-10 12:01:56,966 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 7 from Vox, 27 fro AS 2024-08-10 12:02:02,514 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 12:02:22,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=542570.0, ans=0.07 2024-08-10 12:02:27,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=542570.0, ans=10.0 2024-08-10 12:02:29,060 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-10 12:02:31,625 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-10 12:02:46,000 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10800, loss[loss=0.1117, beats_loss=0.01327, ecapa_loss=0.0002353, whisper_loss=0.09606, over 23449.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01206, ecapa_loss=0.0002596, whisper_loss=0.09679, over 3865834.71 frames. ], batch size: 95, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:02:47,472 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 12:02:56,382 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 12:03:05,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=542870.0, ans=0.0 2024-08-10 12:03:15,758 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 12:03:17,123 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 3.065e+01 3.589e+01 4.278e+01 6.968e+01, threshold=7.178e+01, percent-clipped=0.0 2024-08-10 12:03:18,742 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 12:03:26,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=543070.0, ans=12.0 2024-08-10 12:03:41,490 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.42 vs. limit=15.0 2024-08-10 12:03:42,138 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 39 from LS+wenet, 11 from Vox, 44 fro AS 2024-08-10 12:03:45,845 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2024-08-10 12:03:53,961 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10850, loss[loss=0.09567, beats_loss=0.01239, ecapa_loss=0.0002001, whisper_loss=0.08128, over 19521.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01204, ecapa_loss=0.0002593, whisper_loss=0.09739, over 3884324.03 frames. ], batch size: 75, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:04:21,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=543470.0, ans=0.2 2024-08-10 12:04:30,622 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 12:04:42,524 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.47 vs. limit=15.0 2024-08-10 12:04:47,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=543570.0, ans=0.1 2024-08-10 12:05:01,315 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.28 vs. limit=6.0 2024-08-10 12:05:03,067 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10900, loss[loss=0.113, beats_loss=0.00961, ecapa_loss=0.0002475, whisper_loss=0.1009, over 20892.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01207, ecapa_loss=0.0002584, whisper_loss=0.09652, over 3886023.01 frames. ], batch size: 84, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:05:15,475 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 12:05:17,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=543870.0, ans=0.125 2024-08-10 12:05:29,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=543970.0, ans=0.1 2024-08-10 12:05:35,049 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 3.064e+01 3.469e+01 3.864e+01 6.688e+01, threshold=6.938e+01, percent-clipped=0.0 2024-08-10 12:05:38,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=543970.0, ans=0.125 2024-08-10 12:05:47,513 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.34 vs. limit=15.0 2024-08-10 12:05:51,591 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 12:05:52,685 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-10 12:05:57,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=544070.0, ans=0.0 2024-08-10 12:06:06,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=544170.0, ans=0.2 2024-08-10 12:06:11,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.52 vs. limit=15.0 2024-08-10 12:06:13,379 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 10950, loss[loss=0.1003, beats_loss=0.01113, ecapa_loss=0.0002377, whisper_loss=0.08677, over 14584.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01201, ecapa_loss=0.0002582, whisper_loss=0.09675, over 3916725.49 frames. ], batch size: 55, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:06:17,552 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 12:06:20,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=544270.0, ans=0.125 2024-08-10 12:06:27,456 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.30 vs. limit=10.0 2024-08-10 12:06:35,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=544370.0, ans=0.0 2024-08-10 12:06:52,180 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.47 vs. limit=12.0 2024-08-10 12:06:52,305 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.10 vs. limit=22.5 2024-08-10 12:06:52,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=544470.0, ans=22.5 2024-08-10 12:07:00,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=544570.0, ans=0.1 2024-08-10 12:07:08,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=544670.0, ans=0.2 2024-08-10 12:07:09,501 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 16 from LS+wenet, 31 from Vox, 46 fro AS 2024-08-10 12:07:09,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=544670.0, ans=0.0 2024-08-10 12:07:11,081 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-10 12:07:19,642 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11000, loss[loss=0.1023, beats_loss=0.01259, ecapa_loss=0.0002823, whisper_loss=0.0869, over 14823.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01191, ecapa_loss=0.0002611, whisper_loss=0.09661, over 3914231.11 frames. ], batch size: 60, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:07:23,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=544770.0, ans=0.07 2024-08-10 12:07:34,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=544870.0, ans=0.125 2024-08-10 12:07:38,506 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 27 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-10 12:07:50,197 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.281e+01 2.903e+01 3.309e+01 3.802e+01 5.297e+01, threshold=6.618e+01, percent-clipped=0.0 2024-08-10 12:08:18,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=545170.0, ans=0.0 2024-08-10 12:08:25,650 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11050, loss[loss=0.1193, beats_loss=0.01363, ecapa_loss=0.0002962, whisper_loss=0.1027, over 21666.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01192, ecapa_loss=0.0002604, whisper_loss=0.09647, over 3931461.65 frames. ], batch size: 87, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:08:27,210 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 12:08:30,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=545270.0, ans=0.125 2024-08-10 12:08:31,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=545270.0, ans=0.125 2024-08-10 12:08:32,902 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 12:08:38,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=545370.0, ans=0.125 2024-08-10 12:08:59,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=545470.0, ans=0.07 2024-08-10 12:09:09,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=545570.0, ans=0.0 2024-08-10 12:09:28,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=545670.0, ans=0.2 2024-08-10 12:09:29,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=545670.0, ans=0.1 2024-08-10 12:09:31,586 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11100, loss[loss=0.106, beats_loss=0.01382, ecapa_loss=0.0002477, whisper_loss=0.08972, over 22299.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01201, ecapa_loss=0.000258, whisper_loss=0.09546, over 3890674.75 frames. ], batch size: 92, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:09:33,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=545770.0, ans=0.09899494936611666 2024-08-10 12:09:35,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=545770.0, ans=0.125 2024-08-10 12:09:46,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=545870.0, ans=0.0 2024-08-10 12:09:56,809 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.23 vs. limit=15.0 2024-08-10 12:10:00,123 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-10 12:10:02,554 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.318e+01 3.025e+01 3.487e+01 4.357e+01 7.811e+01, threshold=6.974e+01, percent-clipped=1.0 2024-08-10 12:10:14,922 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-10 12:10:16,207 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-10 12:10:29,116 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 12:10:32,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=546170.0, ans=0.125 2024-08-10 12:10:38,326 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11150, loss[loss=0.1059, beats_loss=0.01344, ecapa_loss=0.000222, whisper_loss=0.09026, over 18831.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01197, ecapa_loss=0.0002582, whisper_loss=0.09557, over 3861264.56 frames. ], batch size: 72, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:10:41,174 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 12:11:23,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=546570.0, ans=0.0 2024-08-10 12:11:31,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=546670.0, ans=0.125 2024-08-10 12:11:41,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=546670.0, ans=0.025 2024-08-10 12:11:44,781 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11200, loss[loss=0.1086, beats_loss=0.01252, ecapa_loss=0.0002367, whisper_loss=0.09367, over 19352.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01199, ecapa_loss=0.0002581, whisper_loss=0.09549, over 3839825.07 frames. ], batch size: 75, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:11:50,774 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.92 vs. limit=22.5 2024-08-10 12:11:51,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=546770.0, ans=0.125 2024-08-10 12:12:04,698 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 12:12:09,917 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-10 12:12:10,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=546970.0, ans=0.125 2024-08-10 12:12:11,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=546970.0, ans=0.2 2024-08-10 12:12:12,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=546970.0, ans=0.125 2024-08-10 12:12:15,210 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.205e+01 3.126e+01 3.422e+01 3.938e+01 7.786e+01, threshold=6.843e+01, percent-clipped=1.0 2024-08-10 12:12:23,964 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2024-08-10 12:12:26,153 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 16 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-10 12:12:35,726 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 19 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-10 12:12:36,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=547070.0, ans=0.09899494936611666 2024-08-10 12:12:39,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=547170.0, ans=0.125 2024-08-10 12:12:41,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=547170.0, ans=0.0 2024-08-10 12:12:51,585 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11250, loss[loss=0.1106, beats_loss=0.01125, ecapa_loss=0.0002634, whisper_loss=0.09674, over 17989.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01194, ecapa_loss=0.000257, whisper_loss=0.0956, over 3854784.60 frames. ], batch size: 72, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:12:58,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=547270.0, ans=0.125 2024-08-10 12:12:59,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=547270.0, ans=0.0 2024-08-10 12:13:10,354 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=12.0 2024-08-10 12:13:25,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=547470.0, ans=0.0 2024-08-10 12:13:30,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=547470.0, ans=0.1 2024-08-10 12:13:31,101 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 12:13:48,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=547670.0, ans=0.0 2024-08-10 12:13:49,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=547670.0, ans=0.125 2024-08-10 12:13:54,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=547670.0, ans=0.125 2024-08-10 12:13:58,969 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11300, loss[loss=0.1351, beats_loss=0.01064, ecapa_loss=0.0002726, whisper_loss=0.1218, over 20417.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01194, ecapa_loss=0.0002555, whisper_loss=0.09546, over 3865530.18 frames. ], batch size: 78, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:14:15,039 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 12:14:30,123 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.381e+01 3.068e+01 3.483e+01 4.119e+01 9.369e+01, threshold=6.966e+01, percent-clipped=1.0 2024-08-10 12:14:30,525 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 12:14:32,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=547970.0, ans=0.1 2024-08-10 12:14:32,356 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.46 vs. limit=15.0 2024-08-10 12:14:34,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=547970.0, ans=0.125 2024-08-10 12:14:38,414 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 32 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-10 12:14:53,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=548170.0, ans=0.0 2024-08-10 12:14:55,339 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-10 12:15:03,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=548170.0, ans=0.2 2024-08-10 12:15:05,729 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11350, loss[loss=0.1124, beats_loss=0.01377, ecapa_loss=0.000251, whisper_loss=0.09614, over 21514.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01183, ecapa_loss=0.0002584, whisper_loss=0.09647, over 3875654.13 frames. ], batch size: 91, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:15:09,628 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-10 12:15:11,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=548270.0, ans=0.1 2024-08-10 12:15:14,786 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-10 12:15:15,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=548270.0, ans=0.2 2024-08-10 12:15:18,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=548370.0, ans=10.0 2024-08-10 12:15:28,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=548370.0, ans=0.2 2024-08-10 12:15:35,774 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 12:15:52,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=548570.0, ans=0.2 2024-08-10 12:15:55,320 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2024-08-10 12:15:56,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=548570.0, ans=0.125 2024-08-10 12:16:01,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=548670.0, ans=0.1 2024-08-10 12:16:06,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=548670.0, ans=0.125 2024-08-10 12:16:11,596 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11400, loss[loss=0.1218, beats_loss=0.01302, ecapa_loss=0.0002858, whisper_loss=0.1059, over 22046.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01198, ecapa_loss=0.0002576, whisper_loss=0.09615, over 3879880.06 frames. ], batch size: 92, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:16:24,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=548870.0, ans=0.09899494936611666 2024-08-10 12:16:29,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=548870.0, ans=0.02 2024-08-10 12:16:32,618 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2024-08-10 12:16:42,667 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 2.940e+01 3.301e+01 3.929e+01 5.377e+01, threshold=6.601e+01, percent-clipped=0.0 2024-08-10 12:16:44,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=548970.0, ans=0.1 2024-08-10 12:16:53,666 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2024-08-10 12:17:06,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=549170.0, ans=0.1 2024-08-10 12:17:09,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=549170.0, ans=0.0 2024-08-10 12:17:12,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=549170.0, ans=0.125 2024-08-10 12:17:18,471 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11450, loss[loss=0.1082, beats_loss=0.01202, ecapa_loss=0.0003017, whisper_loss=0.09315, over 21861.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01197, ecapa_loss=0.0002592, whisper_loss=0.09609, over 3883958.12 frames. ], batch size: 93, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:17:30,812 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 12:17:37,286 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 12:17:43,263 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-10 12:17:53,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=549470.0, ans=0.125 2024-08-10 12:17:57,426 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-10 12:17:59,477 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.83 vs. limit=15.0 2024-08-10 12:18:07,492 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=12.0 2024-08-10 12:18:08,654 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=15.0 2024-08-10 12:18:09,681 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 12:18:12,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=549670.0, ans=0.0 2024-08-10 12:18:12,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=549670.0, ans=0.125 2024-08-10 12:18:19,275 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 12:18:26,124 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11500, loss[loss=0.09811, beats_loss=0.01088, ecapa_loss=0.0002355, whisper_loss=0.08488, over 22903.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01195, ecapa_loss=0.0002595, whisper_loss=0.09583, over 3902297.62 frames. ], batch size: 92, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:18:27,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=549770.0, ans=0.0 2024-08-10 12:18:50,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=549870.0, ans=0.125 2024-08-10 12:18:55,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=549970.0, ans=0.125 2024-08-10 12:18:56,236 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.942e+01 3.473e+01 3.989e+01 7.170e+01, threshold=6.945e+01, percent-clipped=1.0 2024-08-10 12:19:11,205 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 12:19:20,518 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 22 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-10 12:19:32,322 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11550, loss[loss=0.08195, beats_loss=0.01502, ecapa_loss=0.0002399, whisper_loss=0.06453, over 15124.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01186, ecapa_loss=0.00026, whisper_loss=0.09718, over 3925429.47 frames. ], batch size: 63, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:19:53,559 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 12:20:05,185 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 34 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 12:20:06,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=550470.0, ans=0.0 2024-08-10 12:20:13,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=550570.0, ans=0.0 2024-08-10 12:20:30,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=550670.0, ans=0.0 2024-08-10 12:20:38,005 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11600, loss[loss=0.1003, beats_loss=0.013, ecapa_loss=0.0002725, whisper_loss=0.08455, over 19835.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01184, ecapa_loss=0.0002595, whisper_loss=0.09728, over 3930338.01 frames. ], batch size: 83, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:20:38,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=550770.0, ans=0.125 2024-08-10 12:20:42,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=550770.0, ans=0.0 2024-08-10 12:20:44,006 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 12:21:01,122 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 12:21:08,513 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 3.001e+01 3.473e+01 4.016e+01 7.053e+01, threshold=6.947e+01, percent-clipped=1.0 2024-08-10 12:21:10,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=550970.0, ans=0.125 2024-08-10 12:21:12,906 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-10 12:21:13,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=550970.0, ans=0.0 2024-08-10 12:21:28,310 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 12:21:39,235 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 12:21:43,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=551170.0, ans=0.125 2024-08-10 12:21:46,221 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11650, loss[loss=0.1371, beats_loss=0.008828, ecapa_loss=0.0002421, whisper_loss=0.1259, over 15236.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01191, ecapa_loss=0.0002595, whisper_loss=0.09698, over 3925412.21 frames. ], batch size: 56, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:21:50,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=551270.0, ans=0.125 2024-08-10 12:21:54,175 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-10 12:21:56,886 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 22 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-10 12:22:12,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=551470.0, ans=0.125 2024-08-10 12:22:50,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=551670.0, ans=0.0 2024-08-10 12:22:52,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=551670.0, ans=0.1 2024-08-10 12:22:57,794 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11700, loss[loss=0.1185, beats_loss=0.01053, ecapa_loss=0.0002876, whisper_loss=0.1051, over 16429.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01207, ecapa_loss=0.0002578, whisper_loss=0.09655, over 3958788.59 frames. ], batch size: 67, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:23:05,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=551770.0, ans=0.1 2024-08-10 12:23:08,380 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-10 12:23:14,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=551870.0, ans=0.125 2024-08-10 12:23:33,561 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 3.233e+01 3.487e+01 4.046e+01 6.995e+01, threshold=6.974e+01, percent-clipped=1.0 2024-08-10 12:23:34,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=551970.0, ans=0.125 2024-08-10 12:23:39,587 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 12:23:39,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=551970.0, ans=0.125 2024-08-10 12:23:53,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=552070.0, ans=10.0 2024-08-10 12:24:13,350 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11750, loss[loss=0.09861, beats_loss=0.01357, ecapa_loss=0.000232, whisper_loss=0.08272, over 14275.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01202, ecapa_loss=0.0002593, whisper_loss=0.09715, over 3905825.90 frames. ], batch size: 58, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:24:25,902 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 12:24:37,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=552370.0, ans=0.125 2024-08-10 12:25:03,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=552570.0, ans=0.2 2024-08-10 12:25:11,095 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.21 vs. limit=15.0 2024-08-10 12:25:14,742 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.74 vs. limit=15.0 2024-08-10 12:25:28,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552770.0, ans=0.1 2024-08-10 12:25:29,251 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11800, loss[loss=0.1251, beats_loss=0.01159, ecapa_loss=0.0002442, whisper_loss=0.111, over 17618.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01199, ecapa_loss=0.0002597, whisper_loss=0.09704, over 3883739.40 frames. ], batch size: 70, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:25:41,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=552770.0, ans=0.125 2024-08-10 12:25:48,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=552870.0, ans=0.125 2024-08-10 12:26:04,099 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.876e+01 3.467e+01 4.028e+01 7.288e+01, threshold=6.933e+01, percent-clipped=1.0 2024-08-10 12:26:15,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=553070.0, ans=0.125 2024-08-10 12:26:21,684 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 12:26:24,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=553070.0, ans=0.125 2024-08-10 12:26:28,008 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.41 vs. limit=10.0 2024-08-10 12:26:36,440 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 31 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 12:26:44,172 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11850, loss[loss=0.1117, beats_loss=0.009345, ecapa_loss=0.0002787, whisper_loss=0.09952, over 17955.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01203, ecapa_loss=0.0002591, whisper_loss=0.09714, over 3901507.30 frames. ], batch size: 69, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:26:47,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=553270.0, ans=0.125 2024-08-10 12:26:47,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=553270.0, ans=0.125 2024-08-10 12:26:57,051 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 12:26:59,768 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 12:27:05,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=553370.0, ans=0.0 2024-08-10 12:27:23,976 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-10 12:27:28,680 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-10 12:27:57,132 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11900, loss[loss=0.08832, beats_loss=0.01191, ecapa_loss=0.0002891, whisper_loss=0.07352, over 14175.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01213, ecapa_loss=0.0002594, whisper_loss=0.09646, over 3910433.28 frames. ], batch size: 58, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:28:21,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=553870.0, ans=0.5 2024-08-10 12:28:27,265 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-10 12:28:30,726 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.158e+01 3.028e+01 3.380e+01 3.794e+01 5.730e+01, threshold=6.759e+01, percent-clipped=0.0 2024-08-10 12:28:37,487 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.10 vs. limit=15.0 2024-08-10 12:28:38,040 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 12:28:49,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=554070.0, ans=0.125 2024-08-10 12:28:54,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=554170.0, ans=10.0 2024-08-10 12:29:10,456 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 11950, loss[loss=0.105, beats_loss=0.01161, ecapa_loss=0.0002617, whisper_loss=0.09074, over 17174.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01201, ecapa_loss=0.0002591, whisper_loss=0.09686, over 3874997.47 frames. ], batch size: 70, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:29:13,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=554270.0, ans=0.125 2024-08-10 12:29:15,971 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 16 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-10 12:29:25,623 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-10 12:29:30,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=554370.0, ans=0.125 2024-08-10 12:29:37,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=554470.0, ans=0.125 2024-08-10 12:29:46,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=554470.0, ans=0.07 2024-08-10 12:29:56,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=554570.0, ans=0.0 2024-08-10 12:29:57,054 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2024-08-10 12:30:16,279 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.29 vs. limit=15.0 2024-08-10 12:30:22,877 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12000, loss[loss=0.07566, beats_loss=0.01244, ecapa_loss=0.0002849, whisper_loss=0.06036, over 15054.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01193, ecapa_loss=0.0002602, whisper_loss=0.09603, over 3823177.08 frames. ], batch size: 64, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:30:22,878 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-10 12:30:41,887 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.9880, 3.4023, 3.4686, 3.9078], device='cuda:2') 2024-08-10 12:30:59,922 INFO [train_multi_KD3.py:1149] (2/4) Epoch 4, validation on ASR_libri: loss=0.2637, beats_loss=0, ecapa_loss=0.0007919, whisper_loss=0.2558, over 922467.00 frames. 2024-08-10 12:31:17,160 INFO [train_multi_KD3.py:1149] (2/4) Epoch 4, validation on SV_voxceleb1: loss=0.006895, beats_loss=0, ecapa_loss=0.0006895, whisper_loss=0, over 939242.00 frames. 2024-08-10 12:33:04,228 INFO [train_multi_KD3.py:1149] (2/4) Epoch 4, validation on AT_audioset: loss=0.02758, beats_loss=0.02758, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 12:33:04,232 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 12:33:10,860 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.00 vs. limit=15.0 2024-08-10 12:33:14,994 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 14 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-10 12:33:20,173 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-10 12:33:26,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=554870.0, ans=0.1 2024-08-10 12:33:39,849 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.407e+01 3.069e+01 3.344e+01 4.078e+01 6.277e+01, threshold=6.688e+01, percent-clipped=0.0 2024-08-10 12:33:43,813 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.87 vs. limit=15.0 2024-08-10 12:33:57,324 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 12:34:13,177 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 12:34:18,595 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 12:34:24,382 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12050, loss[loss=0.1195, beats_loss=0.01105, ecapa_loss=0.0002274, whisper_loss=0.1062, over 16547.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01205, ecapa_loss=0.0002597, whisper_loss=0.09542, over 3865387.66 frames. ], batch size: 64, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:34:24,632 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 12:34:26,433 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 22 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-10 12:34:28,603 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.11 vs. limit=10.0 2024-08-10 12:34:35,974 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-10 12:34:59,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=555470.0, ans=0.0 2024-08-10 12:35:08,563 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-10 12:35:25,787 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 12:35:31,727 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 12:35:38,490 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 12:35:40,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=555670.0, ans=0.0 2024-08-10 12:35:44,613 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 11 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-10 12:35:46,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=555670.0, ans=0.0 2024-08-10 12:35:49,836 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12100, loss[loss=0.1185, beats_loss=0.008356, ecapa_loss=0.0002617, whisper_loss=0.1076, over 17923.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01201, ecapa_loss=0.0002612, whisper_loss=0.09518, over 3842958.36 frames. ], batch size: 68, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:36:00,485 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 12:36:30,099 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.425e+01 2.918e+01 3.194e+01 3.735e+01 7.690e+01, threshold=6.389e+01, percent-clipped=2.0 2024-08-10 12:36:41,322 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 12:36:49,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=556070.0, ans=0.0 2024-08-10 12:36:50,893 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-10 12:36:52,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=556070.0, ans=0.125 2024-08-10 12:36:54,116 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 12:36:58,594 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 12:37:12,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=556170.0, ans=0.125 2024-08-10 12:37:15,245 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12150, loss[loss=0.1109, beats_loss=0.009177, ecapa_loss=0.0003241, whisper_loss=0.0985, over 21684.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01208, ecapa_loss=0.0002604, whisper_loss=0.09503, over 3841770.94 frames. ], batch size: 89, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:37:19,046 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.27 vs. limit=15.0 2024-08-10 12:37:20,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=556270.0, ans=0.1 2024-08-10 12:37:49,288 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 12:37:52,848 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 20 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-10 12:38:03,794 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2024-08-10 12:38:34,069 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12200, loss[loss=0.1092, beats_loss=0.01151, ecapa_loss=0.0002623, whisper_loss=0.09503, over 15904.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01213, ecapa_loss=0.0002596, whisper_loss=0.09448, over 3814807.95 frames. ], batch size: 62, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:38:44,754 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 12:38:48,216 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.08 vs. limit=22.5 2024-08-10 12:38:54,277 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 40 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-10 12:39:10,301 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.794e+01 3.177e+01 3.515e+01 6.137e+01, threshold=6.354e+01, percent-clipped=0.0 2024-08-10 12:39:36,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=557170.0, ans=0.2 2024-08-10 12:39:38,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=557170.0, ans=0.0 2024-08-10 12:39:54,915 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12250, loss[loss=0.06098, beats_loss=0.01541, ecapa_loss=0.000211, whisper_loss=0.04346, over 13054.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01209, ecapa_loss=0.0002581, whisper_loss=0.09481, over 3833035.08 frames. ], batch size: 55, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:39:56,762 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 12:40:06,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=557270.0, ans=0.125 2024-08-10 12:40:35,393 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=12.0 2024-08-10 12:40:36,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=557470.0, ans=0.07 2024-08-10 12:40:41,202 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 12:40:51,480 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 12:41:04,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=557670.0, ans=0.125 2024-08-10 12:41:08,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=557670.0, ans=0.5 2024-08-10 12:41:14,495 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12300, loss[loss=0.1275, beats_loss=0.01231, ecapa_loss=0.00024, whisper_loss=0.1128, over 17388.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01209, ecapa_loss=0.0002564, whisper_loss=0.09511, over 3837324.77 frames. ], batch size: 69, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:41:24,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=557770.0, ans=0.2 2024-08-10 12:41:48,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=557970.0, ans=0.0 2024-08-10 12:41:48,971 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2024-08-10 12:41:49,708 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 3.037e+01 3.524e+01 3.995e+01 1.053e+02, threshold=7.048e+01, percent-clipped=4.0 2024-08-10 12:42:03,607 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.34 vs. limit=22.5 2024-08-10 12:42:20,359 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 35 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 12:42:23,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=558170.0, ans=0.05 2024-08-10 12:42:33,002 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12350, loss[loss=0.09864, beats_loss=0.01243, ecapa_loss=0.0002878, whisper_loss=0.08333, over 15703.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01199, ecapa_loss=0.0002604, whisper_loss=0.09623, over 3836127.59 frames. ], batch size: 68, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:42:34,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=558270.0, ans=0.0 2024-08-10 12:42:35,876 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 12:42:53,365 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 12:42:58,818 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 18 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 12:43:01,648 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 12:43:02,887 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 12:43:16,015 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=22.5 2024-08-10 12:43:20,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=558470.0, ans=0.0 2024-08-10 12:43:28,394 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 12:43:35,264 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.10 vs. limit=10.0 2024-08-10 12:43:42,315 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 12:43:48,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=558670.0, ans=0.125 2024-08-10 12:43:50,590 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=558670.0, ans=0.125 2024-08-10 12:43:53,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=558670.0, ans=0.0 2024-08-10 12:43:56,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=558670.0, ans=0.09899494936611666 2024-08-10 12:43:58,127 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-10 12:44:01,223 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12400, loss[loss=0.09769, beats_loss=0.01331, ecapa_loss=0.0002617, whisper_loss=0.08176, over 20449.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01197, ecapa_loss=0.0002592, whisper_loss=0.09657, over 3826439.86 frames. ], batch size: 83, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:44:10,450 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.38 vs. limit=15.0 2024-08-10 12:44:35,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=558970.0, ans=0.1 2024-08-10 12:44:37,248 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 12:44:41,290 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.193e+01 2.951e+01 3.310e+01 3.895e+01 5.650e+01, threshold=6.619e+01, percent-clipped=0.0 2024-08-10 12:44:49,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=558970.0, ans=10.0 2024-08-10 12:44:53,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=559070.0, ans=0.125 2024-08-10 12:44:56,484 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 12:45:07,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=559170.0, ans=0.1 2024-08-10 12:45:11,312 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 12:45:15,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=559170.0, ans=0.125 2024-08-10 12:45:19,315 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 12:45:26,584 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12450, loss[loss=0.08586, beats_loss=0.01486, ecapa_loss=0.0002204, whisper_loss=0.0688, over 20252.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01198, ecapa_loss=0.0002608, whisper_loss=0.09662, over 3837933.28 frames. ], batch size: 82, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:45:31,965 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.71 vs. limit=22.5 2024-08-10 12:45:35,634 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 12:45:41,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=559370.0, ans=0.1 2024-08-10 12:45:43,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=559370.0, ans=0.2 2024-08-10 12:45:51,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=559370.0, ans=0.125 2024-08-10 12:46:08,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=559470.0, ans=0.2 2024-08-10 12:46:26,442 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 12:46:27,189 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.40 vs. limit=15.0 2024-08-10 12:46:30,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2024-08-10 12:46:37,945 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-10 12:46:45,701 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12500, loss[loss=0.1226, beats_loss=0.01106, ecapa_loss=0.0002566, whisper_loss=0.109, over 22998.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01212, ecapa_loss=0.0002591, whisper_loss=0.09592, over 3874182.77 frames. ], batch size: 90, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:46:59,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=559770.0, ans=0.125 2024-08-10 12:47:07,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=559870.0, ans=0.125 2024-08-10 12:47:11,953 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.17 vs. limit=15.0 2024-08-10 12:47:21,123 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.27 vs. limit=22.5 2024-08-10 12:47:28,915 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.429e+01 3.210e+01 3.616e+01 4.037e+01 8.521e+01, threshold=7.231e+01, percent-clipped=2.0 2024-08-10 12:47:40,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=560070.0, ans=0.125 2024-08-10 12:47:43,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=560070.0, ans=0.0 2024-08-10 12:47:43,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=560070.0, ans=0.125 2024-08-10 12:47:43,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=560070.0, ans=0.1 2024-08-10 12:47:46,601 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-10 12:47:54,665 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 12:48:01,532 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 12:48:04,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=560170.0, ans=0.1 2024-08-10 12:48:12,591 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12550, loss[loss=0.1111, beats_loss=0.01176, ecapa_loss=0.0002873, whisper_loss=0.09645, over 20886.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01214, ecapa_loss=0.0002587, whisper_loss=0.09545, over 3881962.67 frames. ], batch size: 90, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:48:12,706 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 12:48:15,808 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 12:48:20,936 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-10 12:48:22,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=560270.0, ans=0.0 2024-08-10 12:48:36,307 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 12:49:09,016 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.80 vs. limit=22.5 2024-08-10 12:49:17,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=560670.0, ans=0.1 2024-08-10 12:49:26,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=560670.0, ans=0.0 2024-08-10 12:49:29,713 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12600, loss[loss=0.07589, beats_loss=0.01201, ecapa_loss=0.0002439, whisper_loss=0.06144, over 16870.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01219, ecapa_loss=0.0002593, whisper_loss=0.09542, over 3875281.14 frames. ], batch size: 67, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:49:50,066 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-10 12:50:00,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=560970.0, ans=0.0 2024-08-10 12:50:00,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=560970.0, ans=0.0 2024-08-10 12:50:06,296 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.385e+01 3.110e+01 3.572e+01 4.096e+01 7.155e+01, threshold=7.143e+01, percent-clipped=0.0 2024-08-10 12:50:26,146 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 12:50:36,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=561170.0, ans=0.2 2024-08-10 12:50:38,446 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.788e-01 2024-08-10 12:50:39,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=561170.0, ans=0.0 2024-08-10 12:50:42,388 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 12:50:46,734 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12650, loss[loss=0.08651, beats_loss=0.01402, ecapa_loss=0.0002798, whisper_loss=0.06969, over 17736.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01223, ecapa_loss=0.0002579, whisper_loss=0.09533, over 3850635.11 frames. ], batch size: 75, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:50:50,721 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 12:50:55,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=561270.0, ans=0.0 2024-08-10 12:50:57,481 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 12:51:44,137 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.46 vs. limit=6.0 2024-08-10 12:51:52,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=561670.0, ans=0.1 2024-08-10 12:51:58,850 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2024-08-10 12:51:59,446 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 12:52:01,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=561670.0, ans=0.1 2024-08-10 12:52:08,728 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12700, loss[loss=0.1286, beats_loss=0.01134, ecapa_loss=0.0002248, whisper_loss=0.115, over 23351.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01213, ecapa_loss=0.0002573, whisper_loss=0.09574, over 3839523.81 frames. ], batch size: 91, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:52:14,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=561770.0, ans=0.125 2024-08-10 12:52:32,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=561870.0, ans=0.2 2024-08-10 12:52:44,860 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.862e+01 3.101e+01 3.673e+01 6.463e+01, threshold=6.201e+01, percent-clipped=0.0 2024-08-10 12:52:52,366 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 12:53:03,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=562070.0, ans=0.0 2024-08-10 12:53:26,710 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12750, loss[loss=0.1065, beats_loss=0.01552, ecapa_loss=0.0002529, whisper_loss=0.08845, over 18592.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01222, ecapa_loss=0.0002585, whisper_loss=0.09507, over 3850905.31 frames. ], batch size: 77, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:53:53,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=562370.0, ans=0.125 2024-08-10 12:54:04,957 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 12:54:06,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=562470.0, ans=0.125 2024-08-10 12:54:26,798 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 12:54:38,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=562670.0, ans=0.1 2024-08-10 12:54:42,207 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12800, loss[loss=0.1102, beats_loss=0.01078, ecapa_loss=0.0002267, whisper_loss=0.09718, over 19570.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01227, ecapa_loss=0.0002607, whisper_loss=0.0952, over 3877221.07 frames. ], batch size: 74, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:54:44,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=562770.0, ans=0.125 2024-08-10 12:54:45,845 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-10 12:54:51,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=562770.0, ans=0.1 2024-08-10 12:54:59,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=562870.0, ans=0.0 2024-08-10 12:55:03,041 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2024-08-10 12:55:15,401 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 12:55:15,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=562970.0, ans=0.1 2024-08-10 12:55:17,461 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+01 2.950e+01 3.581e+01 4.034e+01 6.155e+01, threshold=7.162e+01, percent-clipped=0.0 2024-08-10 12:55:25,897 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.95 vs. limit=22.5 2024-08-10 12:55:55,732 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12850, loss[loss=0.1196, beats_loss=0.01205, ecapa_loss=0.0002435, whisper_loss=0.1051, over 16967.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01226, ecapa_loss=0.000262, whisper_loss=0.09478, over 3819133.42 frames. ], batch size: 64, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 12:55:59,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=563270.0, ans=0.025 2024-08-10 12:56:08,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=563270.0, ans=0.125 2024-08-10 12:56:12,120 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-10 12:56:12,780 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.77 vs. limit=15.0 2024-08-10 12:56:19,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=563370.0, ans=0.125 2024-08-10 12:56:19,719 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.40 vs. limit=22.5 2024-08-10 12:56:47,055 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 14 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 12:57:05,798 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12900, loss[loss=0.1095, beats_loss=0.01344, ecapa_loss=0.0002651, whisper_loss=0.09337, over 22328.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01228, ecapa_loss=0.0002598, whisper_loss=0.09414, over 3820468.20 frames. ], batch size: 90, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 12:57:16,368 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-10 12:57:21,824 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-10 12:57:24,430 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 12:57:38,389 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 2.781e+01 3.198e+01 3.852e+01 6.418e+01, threshold=6.396e+01, percent-clipped=0.0 2024-08-10 12:57:44,682 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.77 vs. limit=10.0 2024-08-10 12:58:05,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=564170.0, ans=0.125 2024-08-10 12:58:07,035 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.44 vs. limit=15.0 2024-08-10 12:58:15,918 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 12950, loss[loss=0.1378, beats_loss=0.008556, ecapa_loss=0.0002454, whisper_loss=0.1268, over 17441.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01207, ecapa_loss=0.0002611, whisper_loss=0.09543, over 3807991.55 frames. ], batch size: 65, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 12:58:26,954 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.81 vs. limit=22.5 2024-08-10 12:58:37,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=564370.0, ans=0.0 2024-08-10 12:58:42,214 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 12:59:01,617 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.93 vs. limit=15.0 2024-08-10 12:59:14,903 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.79 vs. limit=15.0 2024-08-10 12:59:17,171 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 12:59:23,836 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13000, loss[loss=0.1177, beats_loss=0.01334, ecapa_loss=0.0002074, whisper_loss=0.1023, over 23083.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01208, ecapa_loss=0.0002602, whisper_loss=0.09606, over 3846975.11 frames. ], batch size: 89, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 12:59:27,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=564770.0, ans=0.125 2024-08-10 12:59:54,713 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.945e+01 3.366e+01 4.220e+01 5.870e+01, threshold=6.733e+01, percent-clipped=0.0 2024-08-10 13:00:03,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=565070.0, ans=0.125 2024-08-10 13:00:10,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=565070.0, ans=0.2 2024-08-10 13:00:32,708 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13050, loss[loss=0.1301, beats_loss=0.01098, ecapa_loss=0.0002234, whisper_loss=0.1169, over 14069.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01214, ecapa_loss=0.0002594, whisper_loss=0.09558, over 3857896.63 frames. ], batch size: 55, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:00:38,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=565270.0, ans=0.0 2024-08-10 13:00:38,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=565270.0, ans=0.1 2024-08-10 13:00:51,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=565370.0, ans=0.2 2024-08-10 13:01:03,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=565470.0, ans=0.0 2024-08-10 13:01:05,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=565470.0, ans=0.0 2024-08-10 13:01:13,533 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-10 13:01:31,145 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-10 13:01:37,968 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2024-08-10 13:01:42,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=565670.0, ans=0.2 2024-08-10 13:01:48,430 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13100, loss[loss=0.1078, beats_loss=0.01361, ecapa_loss=0.0002216, whisper_loss=0.09193, over 14450.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01211, ecapa_loss=0.0002582, whisper_loss=0.09573, over 3895499.16 frames. ], batch size: 55, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:01:58,765 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 13:02:15,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=565870.0, ans=0.125 2024-08-10 13:02:26,204 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 3.014e+01 3.400e+01 3.856e+01 6.675e+01, threshold=6.801e+01, percent-clipped=0.0 2024-08-10 13:02:27,156 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=15.0 2024-08-10 13:03:10,267 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13150, loss[loss=0.1198, beats_loss=0.01083, ecapa_loss=0.0003025, whisper_loss=0.106, over 18583.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01198, ecapa_loss=0.0002583, whisper_loss=0.09657, over 3904382.56 frames. ], batch size: 75, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:03:12,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=566270.0, ans=0.0 2024-08-10 13:03:13,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=566270.0, ans=0.125 2024-08-10 13:03:15,150 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-10 13:03:19,490 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.89 vs. limit=15.0 2024-08-10 13:03:23,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=566270.0, ans=0.125 2024-08-10 13:03:23,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=566270.0, ans=0.125 2024-08-10 13:03:40,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=566470.0, ans=0.125 2024-08-10 13:03:42,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=566470.0, ans=0.035 2024-08-10 13:03:58,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=566570.0, ans=0.2 2024-08-10 13:03:58,917 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.94 vs. limit=15.0 2024-08-10 13:04:06,304 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-10 13:04:14,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=566670.0, ans=0.125 2024-08-10 13:04:31,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13200, loss[loss=0.09785, beats_loss=0.01092, ecapa_loss=0.0002573, whisper_loss=0.08435, over 18587.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01192, ecapa_loss=0.000259, whisper_loss=0.09662, over 3885190.50 frames. ], batch size: 73, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:04:38,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=566770.0, ans=0.0 2024-08-10 13:04:51,895 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 13:04:52,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=566870.0, ans=0.0 2024-08-10 13:04:59,942 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.65 vs. limit=15.0 2024-08-10 13:05:03,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.48 vs. limit=10.0 2024-08-10 13:05:06,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=566970.0, ans=0.0 2024-08-10 13:05:07,730 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.201e+01 2.868e+01 3.486e+01 3.850e+01 5.808e+01, threshold=6.972e+01, percent-clipped=0.0 2024-08-10 13:05:19,544 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-10 13:05:29,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=567070.0, ans=0.125 2024-08-10 13:05:33,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=567170.0, ans=0.2 2024-08-10 13:05:49,347 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.76 vs. limit=15.0 2024-08-10 13:05:50,045 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13250, loss[loss=0.09972, beats_loss=0.01231, ecapa_loss=0.0002593, whisper_loss=0.08482, over 21934.00 frames. ], tot_loss[loss=0.111, beats_loss=0.0119, ecapa_loss=0.0002598, whisper_loss=0.09649, over 3880909.42 frames. ], batch size: 91, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:05:52,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=567270.0, ans=0.04949747468305833 2024-08-10 13:05:59,138 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=8.315e-02 2024-08-10 13:07:11,895 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13300, loss[loss=0.08552, beats_loss=0.01566, ecapa_loss=0.0001967, whisper_loss=0.06789, over 18569.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01191, ecapa_loss=0.0002587, whisper_loss=0.09659, over 3880523.74 frames. ], batch size: 73, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:07:26,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=567870.0, ans=0.035 2024-08-10 13:07:37,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=567870.0, ans=0.1 2024-08-10 13:07:37,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=567870.0, ans=0.125 2024-08-10 13:07:48,440 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 3.028e+01 3.547e+01 3.970e+01 7.425e+01, threshold=7.095e+01, percent-clipped=1.0 2024-08-10 13:07:50,319 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 13:07:50,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=567970.0, ans=0.125 2024-08-10 13:08:01,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=568070.0, ans=0.07 2024-08-10 13:08:16,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=568170.0, ans=0.125 2024-08-10 13:08:28,706 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 13:08:30,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=568270.0, ans=0.2 2024-08-10 13:08:31,750 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13350, loss[loss=0.1088, beats_loss=0.01138, ecapa_loss=0.0003455, whisper_loss=0.09393, over 16463.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01198, ecapa_loss=0.0002585, whisper_loss=0.09614, over 3867067.58 frames. ], batch size: 69, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:08:33,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=568270.0, ans=0.05 2024-08-10 13:09:16,643 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 13:09:46,854 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13400, loss[loss=0.0926, beats_loss=0.01368, ecapa_loss=0.0002465, whisper_loss=0.07645, over 22036.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01201, ecapa_loss=0.0002603, whisper_loss=0.09559, over 3841119.41 frames. ], batch size: 92, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:09:53,328 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-10 13:10:05,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=568870.0, ans=0.125 2024-08-10 13:10:16,066 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 13:10:17,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=568970.0, ans=10.0 2024-08-10 13:10:18,354 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 2.910e+01 3.380e+01 3.958e+01 6.126e+01, threshold=6.760e+01, percent-clipped=0.0 2024-08-10 13:10:18,570 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 13:10:25,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=568970.0, ans=0.1 2024-08-10 13:10:33,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=569070.0, ans=0.1 2024-08-10 13:10:35,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=569070.0, ans=0.0 2024-08-10 13:10:50,789 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.91 vs. limit=15.0 2024-08-10 13:10:51,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=569170.0, ans=0.2 2024-08-10 13:10:51,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=569170.0, ans=0.0 2024-08-10 13:10:56,401 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13450, loss[loss=0.1079, beats_loss=0.01218, ecapa_loss=0.0002557, whisper_loss=0.09312, over 22352.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01204, ecapa_loss=0.0002591, whisper_loss=0.09541, over 3882110.91 frames. ], batch size: 91, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:10:58,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=569270.0, ans=0.1 2024-08-10 13:11:04,439 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.42 vs. limit=10.0 2024-08-10 13:11:09,392 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 25 from LS+wenet, 16 from Vox, 16 fro AS 2024-08-10 13:11:11,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=569370.0, ans=0.2 2024-08-10 13:11:30,116 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 13:11:32,661 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 13:11:45,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=569570.0, ans=0.2 2024-08-10 13:11:52,392 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-10 13:11:55,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=569670.0, ans=15.0 2024-08-10 13:11:59,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=569670.0, ans=0.125 2024-08-10 13:12:04,053 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13500, loss[loss=0.1089, beats_loss=0.01044, ecapa_loss=0.0002694, whisper_loss=0.09572, over 17078.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01207, ecapa_loss=0.000258, whisper_loss=0.0957, over 3869195.96 frames. ], batch size: 69, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:12:13,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=569770.0, ans=0.125 2024-08-10 13:12:16,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=569870.0, ans=0.0 2024-08-10 13:12:17,857 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 13:12:35,743 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.211e+01 2.996e+01 3.434e+01 4.154e+01 6.721e+01, threshold=6.868e+01, percent-clipped=0.0 2024-08-10 13:12:41,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2024-08-10 13:12:42,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=569970.0, ans=0.125 2024-08-10 13:12:50,319 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 17 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 13:12:53,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=570070.0, ans=0.125 2024-08-10 13:13:11,488 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13550, loss[loss=0.1189, beats_loss=0.009886, ecapa_loss=0.0002452, whisper_loss=0.1065, over 22562.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01206, ecapa_loss=0.0002573, whisper_loss=0.09615, over 3864604.31 frames. ], batch size: 89, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:13:12,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=570270.0, ans=0.1 2024-08-10 13:13:21,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=570270.0, ans=0.07 2024-08-10 13:13:22,482 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 13:13:23,734 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 13:13:39,736 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 13:13:42,814 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.65 vs. limit=22.5 2024-08-10 13:13:57,854 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 13:13:59,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=570570.0, ans=0.07 2024-08-10 13:14:05,627 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 13:14:05,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=570670.0, ans=0.1 2024-08-10 13:14:08,184 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-10 13:14:16,932 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13600, loss[loss=0.1265, beats_loss=0.01252, ecapa_loss=0.0002486, whisper_loss=0.1115, over 23400.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01205, ecapa_loss=0.0002574, whisper_loss=0.09687, over 3896107.42 frames. ], batch size: 91, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:14:22,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=570770.0, ans=0.0 2024-08-10 13:14:27,401 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 19 from LS+wenet, 22 from Vox, 50 fro AS 2024-08-10 13:14:27,839 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-10 13:14:33,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=570870.0, ans=0.0 2024-08-10 13:14:37,999 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 13:14:44,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=570970.0, ans=0.2 2024-08-10 13:14:47,617 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+01 2.840e+01 3.248e+01 3.798e+01 4.801e+01, threshold=6.497e+01, percent-clipped=0.0 2024-08-10 13:14:55,656 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 13:15:01,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=571070.0, ans=0.2 2024-08-10 13:15:06,620 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.81 vs. limit=15.0 2024-08-10 13:15:11,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=571170.0, ans=0.125 2024-08-10 13:15:15,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=571170.0, ans=0.125 2024-08-10 13:15:19,011 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 13:15:21,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=571270.0, ans=0.0 2024-08-10 13:15:22,588 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13650, loss[loss=0.1164, beats_loss=0.01225, ecapa_loss=0.0002408, whisper_loss=0.1018, over 21352.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01206, ecapa_loss=0.0002567, whisper_loss=0.09636, over 3876047.23 frames. ], batch size: 84, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:15:24,036 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 13:15:27,133 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.40 vs. limit=10.0 2024-08-10 13:15:27,785 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 13:16:08,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=571570.0, ans=0.0 2024-08-10 13:16:08,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=571570.0, ans=0.125 2024-08-10 13:16:11,362 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 13:16:12,856 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 13:16:20,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=571670.0, ans=0.125 2024-08-10 13:16:30,569 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13700, loss[loss=0.1044, beats_loss=0.01453, ecapa_loss=0.000239, whisper_loss=0.08746, over 21291.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01212, ecapa_loss=0.0002562, whisper_loss=0.09586, over 3894202.94 frames. ], batch size: 87, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:16:47,310 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.56 vs. limit=22.5 2024-08-10 13:16:59,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=571970.0, ans=0.2 2024-08-10 13:17:01,538 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.031e+01 2.889e+01 3.237e+01 4.000e+01 5.503e+01, threshold=6.474e+01, percent-clipped=0.0 2024-08-10 13:17:01,676 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 13:17:22,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=572070.0, ans=0.125 2024-08-10 13:17:22,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=572070.0, ans=0.02 2024-08-10 13:17:26,017 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-10 13:17:32,811 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 41 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 13:17:37,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=572270.0, ans=0.125 2024-08-10 13:17:38,010 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13750, loss[loss=0.1238, beats_loss=0.009949, ecapa_loss=0.0002975, whisper_loss=0.1108, over 22551.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01202, ecapa_loss=0.0002579, whisper_loss=0.09627, over 3903002.84 frames. ], batch size: 91, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:18:05,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=572470.0, ans=0.0 2024-08-10 13:18:06,737 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 13:18:19,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=572570.0, ans=0.5 2024-08-10 13:18:27,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=572570.0, ans=0.0 2024-08-10 13:18:28,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=572570.0, ans=0.2 2024-08-10 13:18:38,240 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 12 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 13:18:46,274 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13800, loss[loss=0.1082, beats_loss=0.009761, ecapa_loss=0.0002275, whisper_loss=0.09616, over 16156.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01196, ecapa_loss=0.000257, whisper_loss=0.097, over 3898521.34 frames. ], batch size: 62, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:19:04,819 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 13:19:05,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=572870.0, ans=0.125 2024-08-10 13:19:18,111 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.346e+01 2.990e+01 3.419e+01 4.092e+01 5.899e+01, threshold=6.838e+01, percent-clipped=0.0 2024-08-10 13:19:28,011 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 13:19:30,631 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 13:19:35,926 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 13:19:54,498 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2024-08-10 13:19:54,803 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13850, loss[loss=0.1187, beats_loss=0.01053, ecapa_loss=0.0003259, whisper_loss=0.1049, over 21931.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01184, ecapa_loss=0.0002594, whisper_loss=0.09752, over 3933868.27 frames. ], batch size: 93, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:20:03,523 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-10 13:20:07,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=573370.0, ans=0.0 2024-08-10 13:20:07,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=573370.0, ans=0.04949747468305833 2024-08-10 13:20:18,445 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 13:20:20,996 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.70 vs. limit=8.0 2024-08-10 13:20:31,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.68 vs. limit=12.0 2024-08-10 13:20:39,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=573570.0, ans=0.125 2024-08-10 13:20:40,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=573570.0, ans=0.0 2024-08-10 13:20:55,480 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 13:21:03,885 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13900, loss[loss=0.1348, beats_loss=0.009143, ecapa_loss=0.0002562, whisper_loss=0.1231, over 18060.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01183, ecapa_loss=0.0002592, whisper_loss=0.09803, over 3939198.34 frames. ], batch size: 69, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:21:06,660 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 13:21:09,313 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 13:21:12,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=573770.0, ans=0.0 2024-08-10 13:21:17,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=573870.0, ans=0.125 2024-08-10 13:21:20,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=573870.0, ans=0.125 2024-08-10 13:21:22,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=573870.0, ans=0.125 2024-08-10 13:21:26,175 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-10 13:21:35,192 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.024e+01 3.046e+01 3.391e+01 3.778e+01 5.936e+01, threshold=6.783e+01, percent-clipped=0.0 2024-08-10 13:21:44,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=574070.0, ans=0.0 2024-08-10 13:21:53,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2024-08-10 13:22:08,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=574170.0, ans=0.0 2024-08-10 13:22:13,093 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 13950, loss[loss=0.1249, beats_loss=0.01208, ecapa_loss=0.0001925, whisper_loss=0.1109, over 21206.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01191, ecapa_loss=0.0002575, whisper_loss=0.09716, over 3911158.36 frames. ], batch size: 77, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:22:13,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=574270.0, ans=0.125 2024-08-10 13:22:16,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=574270.0, ans=0.035 2024-08-10 13:22:20,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=574270.0, ans=0.125 2024-08-10 13:22:25,789 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.304e+01 2024-08-10 13:22:39,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=574470.0, ans=0.1 2024-08-10 13:22:51,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=574470.0, ans=0.125 2024-08-10 13:22:54,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=574570.0, ans=0.0 2024-08-10 13:22:56,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=574570.0, ans=0.1 2024-08-10 13:23:00,554 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 15 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 13:23:04,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=574570.0, ans=0.0 2024-08-10 13:23:20,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=574670.0, ans=0.125 2024-08-10 13:23:22,574 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 14000, loss[loss=0.1007, beats_loss=0.01294, ecapa_loss=0.0003341, whisper_loss=0.08444, over 15816.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.0119, ecapa_loss=0.0002569, whisper_loss=0.09728, over 3874371.00 frames. ], batch size: 69, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:23:24,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=574770.0, ans=0.0 2024-08-10 13:23:25,950 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.32 vs. limit=15.0 2024-08-10 13:23:28,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=574770.0, ans=0.125 2024-08-10 13:23:35,536 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 13:23:48,589 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.43 vs. limit=10.0 2024-08-10 13:23:54,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=574970.0, ans=0.1 2024-08-10 13:23:55,219 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.914e+01 3.235e+01 3.866e+01 6.339e+01, threshold=6.469e+01, percent-clipped=0.0 2024-08-10 13:23:59,623 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 13:24:09,175 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 33 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 13:24:14,184 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2024-08-10 13:24:15,366 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=15.0 2024-08-10 13:24:25,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=575170.0, ans=0.125 2024-08-10 13:24:34,135 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 14050, loss[loss=0.118, beats_loss=0.01401, ecapa_loss=0.0002036, whisper_loss=0.1019, over 22827.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01188, ecapa_loss=0.000257, whisper_loss=0.09705, over 3886781.97 frames. ], batch size: 89, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:24:35,314 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2024-08-10 13:24:48,345 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 13:25:04,626 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 13:25:21,999 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-10 13:25:47,581 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 14100, loss[loss=0.1128, beats_loss=0.01232, ecapa_loss=0.0003524, whisper_loss=0.09699, over 20802.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01194, ecapa_loss=0.0002554, whisper_loss=0.09667, over 3900073.76 frames. ], batch size: 91, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:25:57,833 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.28 vs. limit=15.0 2024-08-10 13:26:05,588 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-10 13:26:25,061 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.359e+01 3.001e+01 3.648e+01 4.223e+01 8.641e+01, threshold=7.295e+01, percent-clipped=2.0 2024-08-10 13:26:27,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=575970.0, ans=0.125 2024-08-10 13:26:43,019 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 22 from LS+wenet, 22 from Vox, 52 fro AS 2024-08-10 13:26:49,353 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 12 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 13:26:56,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=576170.0, ans=0.125 2024-08-10 13:26:58,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=576170.0, ans=0.1 2024-08-10 13:27:04,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=576170.0, ans=0.0 2024-08-10 13:27:08,015 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 14150, loss[loss=0.111, beats_loss=0.01259, ecapa_loss=0.000258, whisper_loss=0.0958, over 12519.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01201, ecapa_loss=0.0002561, whisper_loss=0.09606, over 3894451.38 frames. ], batch size: 53, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:27:21,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=576270.0, ans=0.125 2024-08-10 13:28:21,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=576670.0, ans=0.125 2024-08-10 13:28:24,089 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2024-08-10 13:28:32,918 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 14200, loss[loss=0.1209, beats_loss=0.01204, ecapa_loss=0.0001722, whisper_loss=0.1072, over 16017.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01206, ecapa_loss=0.0002553, whisper_loss=0.09583, over 3880393.07 frames. ], batch size: 59, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:28:34,800 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-10 13:28:53,395 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-10 13:29:06,281 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=15.0 2024-08-10 13:29:17,149 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.473e+01 3.094e+01 3.432e+01 3.863e+01 7.530e+01, threshold=6.863e+01, percent-clipped=1.0 2024-08-10 13:29:27,551 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-10 13:29:46,583 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-10 13:29:56,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=577170.0, ans=0.04949747468305833 2024-08-10 13:30:08,985 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 14250, loss[loss=0.09145, beats_loss=0.01252, ecapa_loss=0.0002878, whisper_loss=0.07605, over 13919.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.012, ecapa_loss=0.0002538, whisper_loss=0.09558, over 3875985.32 frames. ], batch size: 57, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:30:37,198 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 13:30:50,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=577470.0, ans=0.2 2024-08-10 13:31:32,065 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2024-08-10 13:31:33,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=577670.0, ans=0.125 2024-08-10 13:31:35,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=577670.0, ans=0.0 2024-08-10 13:31:56,007 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 14300, loss[loss=0.097, beats_loss=0.01312, ecapa_loss=0.0002562, whisper_loss=0.08131, over 20787.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01196, ecapa_loss=0.0002538, whisper_loss=0.09595, over 3899792.17 frames. ], batch size: 84, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:32:16,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=577870.0, ans=0.125 2024-08-10 13:32:43,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=577970.0, ans=0.2 2024-08-10 13:32:44,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.908e+01 3.226e+01 3.811e+01 6.354e+01, threshold=6.452e+01, percent-clipped=0.0 2024-08-10 13:32:55,461 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2024-08-10 13:33:03,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=578070.0, ans=0.125 2024-08-10 13:33:10,441 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.90 vs. limit=10.0 2024-08-10 13:33:42,564 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 14350, loss[loss=0.1256, beats_loss=0.009908, ecapa_loss=0.00024, whisper_loss=0.1132, over 17446.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01196, ecapa_loss=0.0002536, whisper_loss=0.0957, over 3902516.78 frames. ], batch size: 65, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:33:48,134 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-10 13:33:49,446 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 19 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-10 13:34:03,460 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 13:34:11,382 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-10 13:34:24,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=578570.0, ans=0.125 2024-08-10 13:34:32,403 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.46 vs. limit=22.5 2024-08-10 13:34:45,089 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.14 vs. limit=10.0 2024-08-10 13:34:50,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=578670.0, ans=0.1 2024-08-10 13:34:50,902 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.22 vs. limit=15.0 2024-08-10 13:34:52,320 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 14400, loss[loss=0.09781, beats_loss=0.01481, ecapa_loss=0.0002225, whisper_loss=0.08077, over 22218.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01195, ecapa_loss=0.000255, whisper_loss=0.09637, over 3914327.47 frames. ], batch size: 89, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:34:54,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=578770.0, ans=0.1 2024-08-10 13:34:55,900 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 13:35:14,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=578870.0, ans=0.1 2024-08-10 13:35:15,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=578870.0, ans=0.1 2024-08-10 13:35:24,961 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 3.244e+01 3.522e+01 4.448e+01 1.287e+02, threshold=7.043e+01, percent-clipped=5.0 2024-08-10 13:35:25,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=578970.0, ans=0.125 2024-08-10 13:35:31,796 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-10 13:35:36,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=579070.0, ans=0.09899494936611666 2024-08-10 13:35:42,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=579070.0, ans=0.0 2024-08-10 13:35:46,161 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 13:35:50,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=579170.0, ans=0.2 2024-08-10 13:35:57,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=579170.0, ans=0.125 2024-08-10 13:36:02,143 INFO [train_multi_KD3.py:1116] (2/4) Epoch 4, batch 14450, loss[loss=0.1463, beats_loss=0.01001, ecapa_loss=0.0002632, whisper_loss=0.1337, over 20360.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01214, ecapa_loss=0.0002554, whisper_loss=0.095, over 3896816.08 frames. ], batch size: 78, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:36:02,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=579270.0, ans=0.0 2024-08-10 13:36:13,308 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 13:36:15,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=579370.0, ans=0.2 2024-08-10 13:36:16,267 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 13:36:19,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=579370.0, ans=0.125 2024-08-10 13:36:19,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=579370.0, ans=0.1 2024-08-10 13:36:28,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=579470.0, ans=0.125 2024-08-10 13:36:29,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=579470.0, ans=0.1 2024-08-10 13:36:32,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=579470.0, ans=0.2 2024-08-10 13:36:32,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.42 vs. limit=15.0 2024-08-10 13:36:33,373 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.11 vs. limit=5.0 2024-08-10 13:36:38,648 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2024-08-10 13:36:49,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=579570.0, ans=0.0 2024-08-10 13:36:53,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=579570.0, ans=0.125 2024-08-10 13:37:00,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=579670.0, ans=0.125 2024-08-10 13:37:44,857 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 0, loss[loss=0.1051, beats_loss=0.01229, ecapa_loss=0.0002603, whisper_loss=0.09024, over 17782.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01229, ecapa_loss=0.0002603, whisper_loss=0.09024, over 17782.00 frames. ], batch size: 69, lr: 1.31e-02, grad_scale: 8589934592.0 2024-08-10 13:37:44,857 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-10 13:38:27,636 INFO [train_multi_KD3.py:1149] (2/4) Epoch 5, validation on ASR_libri: loss=0.2622, beats_loss=0, ecapa_loss=0.0007699, whisper_loss=0.2545, over 922467.00 frames. 2024-08-10 13:38:42,883 INFO [train_multi_KD3.py:1149] (2/4) Epoch 5, validation on SV_voxceleb1: loss=0.006763, beats_loss=0, ecapa_loss=0.0006763, whisper_loss=0, over 939242.00 frames. 2024-08-10 13:39:47,508 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.7728, 1.8526, 1.6724, 1.7757, 1.0796, 1.5741, 2.5306, 1.3938], device='cuda:2') 2024-08-10 13:40:39,905 INFO [train_multi_KD3.py:1149] (2/4) Epoch 5, validation on AT_audioset: loss=0.02719, beats_loss=0.02719, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 13:40:39,908 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 13:40:51,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=579720.0, ans=0.125 2024-08-10 13:40:57,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=579720.0, ans=0.125 2024-08-10 13:41:06,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=579820.0, ans=0.125 2024-08-10 13:41:34,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=579920.0, ans=0.2 2024-08-10 13:41:51,417 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 33 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 13:41:53,191 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 3.049e+01 3.546e+01 4.164e+01 6.478e+01, threshold=7.092e+01, percent-clipped=0.0 2024-08-10 13:42:04,318 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.71 vs. limit=10.0 2024-08-10 13:42:11,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=580020.0, ans=0.0 2024-08-10 13:42:46,416 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 50, loss[loss=0.1144, beats_loss=0.01342, ecapa_loss=0.0002017, whisper_loss=0.09896, over 21523.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01199, ecapa_loss=0.0002547, whisper_loss=0.09522, over 890421.69 frames. ], batch size: 84, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:43:15,321 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 13:43:18,153 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.721e-01 2024-08-10 13:43:25,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=580320.0, ans=0.125 2024-08-10 13:43:48,969 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-10 13:43:55,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=580520.0, ans=0.125 2024-08-10 13:44:11,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=580520.0, ans=0.09899494936611666 2024-08-10 13:44:19,214 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 25 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-10 13:44:33,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=580620.0, ans=0.04949747468305833 2024-08-10 13:44:41,931 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 100, loss[loss=0.116, beats_loss=0.01368, ecapa_loss=0.0001966, whisper_loss=0.1004, over 23674.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01172, ecapa_loss=0.0002506, whisper_loss=0.09603, over 1548474.03 frames. ], batch size: 89, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:45:02,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=580820.0, ans=0.125 2024-08-10 13:45:35,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=580920.0, ans=0.1 2024-08-10 13:45:43,222 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.397e+01 3.227e+01 3.615e+01 4.275e+01 6.139e+01, threshold=7.229e+01, percent-clipped=0.0 2024-08-10 13:46:16,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=581120.0, ans=0.0 2024-08-10 13:46:27,369 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 150, loss[loss=0.1095, beats_loss=0.01254, ecapa_loss=0.0002545, whisper_loss=0.09444, over 22518.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01163, ecapa_loss=0.0002525, whisper_loss=0.09637, over 2039389.60 frames. ], batch size: 92, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:46:28,782 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 13:46:39,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=581220.0, ans=0.1 2024-08-10 13:47:09,303 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2024-08-10 13:47:16,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=581520.0, ans=0.0 2024-08-10 13:47:21,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=581520.0, ans=0.0 2024-08-10 13:47:37,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=581620.0, ans=0.125 2024-08-10 13:47:46,537 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 200, loss[loss=0.1021, beats_loss=0.01614, ecapa_loss=0.0002114, whisper_loss=0.08384, over 18555.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01177, ecapa_loss=0.0002511, whisper_loss=0.09634, over 2419040.95 frames. ], batch size: 76, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:47:48,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=581720.0, ans=0.0 2024-08-10 13:47:48,939 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.48 vs. limit=15.0 2024-08-10 13:48:04,022 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2024-08-10 13:48:27,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=581920.0, ans=0.0 2024-08-10 13:48:28,241 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.056e+01 3.079e+01 3.499e+01 4.044e+01 6.352e+01, threshold=6.999e+01, percent-clipped=0.0 2024-08-10 13:48:33,016 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 37 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 13:48:44,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=582120.0, ans=0.125 2024-08-10 13:49:01,012 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 250, loss[loss=0.08006, beats_loss=0.01357, ecapa_loss=0.0003535, whisper_loss=0.06296, over 15564.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01172, ecapa_loss=0.0002506, whisper_loss=0.09744, over 2732815.07 frames. ], batch size: 71, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:49:35,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=582420.0, ans=0.0 2024-08-10 13:49:35,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=582420.0, ans=0.125 2024-08-10 13:49:38,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=582420.0, ans=0.0 2024-08-10 13:49:39,240 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 13:49:56,462 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.27 vs. limit=22.5 2024-08-10 13:50:07,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=582620.0, ans=0.0 2024-08-10 13:50:16,403 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 300, loss[loss=0.1118, beats_loss=0.01156, ecapa_loss=0.0002779, whisper_loss=0.09749, over 18470.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01175, ecapa_loss=0.000251, whisper_loss=0.09674, over 2978757.84 frames. ], batch size: 76, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:50:34,270 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 13:50:44,215 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.23 vs. limit=15.0 2024-08-10 13:50:45,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=582920.0, ans=0.125 2024-08-10 13:50:45,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=582920.0, ans=0.125 2024-08-10 13:50:53,777 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 13:50:58,482 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.973e+01 3.374e+01 4.127e+01 8.161e+01, threshold=6.749e+01, percent-clipped=1.0 2024-08-10 13:51:05,129 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.40 vs. limit=15.0 2024-08-10 13:51:07,490 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-10 13:51:19,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=583120.0, ans=0.2 2024-08-10 13:51:19,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=583120.0, ans=0.0 2024-08-10 13:51:22,985 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 13:51:23,791 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.65 vs. limit=15.0 2024-08-10 13:51:30,052 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 350, loss[loss=0.08274, beats_loss=0.01178, ecapa_loss=0.0002268, whisper_loss=0.06869, over 14036.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01174, ecapa_loss=0.0002492, whisper_loss=0.09546, over 3147096.42 frames. ], batch size: 55, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:51:37,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=583220.0, ans=0.1 2024-08-10 13:51:40,116 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 13:51:47,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=583320.0, ans=0.125 2024-08-10 13:52:27,104 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 21 from LS+wenet, 26 from Vox, 47 fro AS 2024-08-10 13:52:43,037 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 400, loss[loss=0.09873, beats_loss=0.01514, ecapa_loss=0.0002654, whisper_loss=0.08094, over 16361.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01175, ecapa_loss=0.0002487, whisper_loss=0.0956, over 3296362.79 frames. ], batch size: 68, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:52:50,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=583720.0, ans=0.125 2024-08-10 13:53:25,609 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+01 2.918e+01 3.260e+01 3.754e+01 7.890e+01, threshold=6.521e+01, percent-clipped=1.0 2024-08-10 13:53:36,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=584020.0, ans=0.125 2024-08-10 13:53:38,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=584020.0, ans=0.2 2024-08-10 13:53:42,828 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 13:53:58,367 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2024-08-10 13:53:58,565 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.32 vs. limit=12.0 2024-08-10 13:54:00,546 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 450, loss[loss=0.1392, beats_loss=0.009215, ecapa_loss=0.0002068, whisper_loss=0.128, over 15496.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01168, ecapa_loss=0.0002463, whisper_loss=0.09676, over 3433664.61 frames. ], batch size: 54, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:54:02,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=584220.0, ans=0.125 2024-08-10 13:54:03,811 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 34 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 13:54:07,048 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.56 vs. limit=6.0 2024-08-10 13:54:26,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=584320.0, ans=0.1 2024-08-10 13:54:31,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=584420.0, ans=0.0 2024-08-10 13:54:45,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=584520.0, ans=0.1 2024-08-10 13:54:51,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=584520.0, ans=0.2 2024-08-10 13:54:53,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=584520.0, ans=0.0 2024-08-10 13:54:54,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=584520.0, ans=0.125 2024-08-10 13:54:57,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=584620.0, ans=0.0 2024-08-10 13:54:58,194 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 13:55:12,585 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 500, loss[loss=0.09921, beats_loss=0.01126, ecapa_loss=0.0002212, whisper_loss=0.08574, over 13801.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01164, ecapa_loss=0.000245, whisper_loss=0.09664, over 3524595.12 frames. ], batch size: 55, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:55:23,171 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 13:55:23,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=584720.0, ans=0.2 2024-08-10 13:55:41,743 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.99 vs. limit=6.0 2024-08-10 13:55:43,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=584920.0, ans=0.2 2024-08-10 13:55:52,505 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.793e+01 3.161e+01 3.607e+01 7.948e+01, threshold=6.322e+01, percent-clipped=1.0 2024-08-10 13:55:53,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=584920.0, ans=0.125 2024-08-10 13:56:04,139 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 13:56:10,516 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-10 13:56:18,967 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.84 vs. limit=22.5 2024-08-10 13:56:24,038 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 550, loss[loss=0.1279, beats_loss=0.01146, ecapa_loss=0.0002033, whisper_loss=0.1144, over 23298.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01157, ecapa_loss=0.0002449, whisper_loss=0.09676, over 3597604.42 frames. ], batch size: 90, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:56:55,377 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.40 vs. limit=22.5 2024-08-10 13:57:04,724 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 13:57:18,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=585520.0, ans=0.125 2024-08-10 13:57:23,222 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 13:57:33,187 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 13:57:36,115 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 600, loss[loss=0.08448, beats_loss=0.01326, ecapa_loss=0.0002123, whisper_loss=0.0691, over 18207.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.0116, ecapa_loss=0.0002418, whisper_loss=0.09691, over 3635617.88 frames. ], batch size: 71, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:57:42,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=585720.0, ans=0.125 2024-08-10 13:57:54,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=585820.0, ans=15.0 2024-08-10 13:58:04,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=585920.0, ans=0.1 2024-08-10 13:58:10,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=585920.0, ans=0.0 2024-08-10 13:58:16,839 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.818e+01 3.113e+01 3.779e+01 5.763e+01, threshold=6.225e+01, percent-clipped=0.0 2024-08-10 13:58:17,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=585920.0, ans=0.125 2024-08-10 13:58:20,055 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 13:58:24,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=586020.0, ans=0.05 2024-08-10 13:58:25,806 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-10 13:58:26,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=586020.0, ans=0.125 2024-08-10 13:58:36,017 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 13:58:37,570 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-10 13:58:42,594 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2024-08-10 13:58:48,514 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 650, loss[loss=0.09583, beats_loss=0.01314, ecapa_loss=0.0002081, whisper_loss=0.08061, over 20164.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01155, ecapa_loss=0.0002424, whisper_loss=0.09767, over 3675933.14 frames. ], batch size: 79, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:59:07,543 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 13:59:10,134 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 31 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-10 13:59:10,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=586320.0, ans=0.0 2024-08-10 13:59:17,456 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.43 vs. limit=15.0 2024-08-10 13:59:53,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=586620.0, ans=0.0 2024-08-10 13:59:58,199 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 700, loss[loss=0.1124, beats_loss=0.01071, ecapa_loss=0.0002959, whisper_loss=0.09869, over 18614.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01152, ecapa_loss=0.0002409, whisper_loss=0.09759, over 3686834.37 frames. ], batch size: 76, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:00:01,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=586720.0, ans=0.09899494936611666 2024-08-10 14:00:30,034 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 14:00:38,600 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+01 2.878e+01 3.235e+01 3.847e+01 7.521e+01, threshold=6.470e+01, percent-clipped=2.0 2024-08-10 14:00:55,403 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 14:01:03,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=587120.0, ans=0.0 2024-08-10 14:01:11,543 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 750, loss[loss=0.1261, beats_loss=0.01064, ecapa_loss=0.0002967, whisper_loss=0.1125, over 21373.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01159, ecapa_loss=0.0002398, whisper_loss=0.09694, over 3726583.22 frames. ], batch size: 85, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:01:25,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=587320.0, ans=0.0 2024-08-10 14:01:27,928 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.93 vs. limit=10.0 2024-08-10 14:01:41,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=587420.0, ans=0.0 2024-08-10 14:01:43,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=587420.0, ans=0.0 2024-08-10 14:01:48,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=587420.0, ans=0.125 2024-08-10 14:01:59,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=587520.0, ans=0.125 2024-08-10 14:02:04,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=587520.0, ans=0.125 2024-08-10 14:02:12,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=587620.0, ans=0.1 2024-08-10 14:02:21,750 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 800, loss[loss=0.1027, beats_loss=0.01172, ecapa_loss=0.0002399, whisper_loss=0.08862, over 19932.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01164, ecapa_loss=0.0002386, whisper_loss=0.09605, over 3761622.36 frames. ], batch size: 79, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:02:42,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=587820.0, ans=0.125 2024-08-10 14:02:47,651 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 12 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 14:02:53,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=587920.0, ans=0.125 2024-08-10 14:02:58,480 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 33 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 14:03:01,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.857e+01 3.282e+01 4.072e+01 6.223e+01, threshold=6.564e+01, percent-clipped=0.0 2024-08-10 14:03:09,681 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.48 vs. limit=22.5 2024-08-10 14:03:30,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=588120.0, ans=0.5 2024-08-10 14:03:33,901 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 850, loss[loss=0.09117, beats_loss=0.01567, ecapa_loss=0.0002334, whisper_loss=0.07316, over 22124.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01166, ecapa_loss=0.0002379, whisper_loss=0.09521, over 3760179.23 frames. ], batch size: 91, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:03:35,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=588220.0, ans=0.125 2024-08-10 14:03:40,264 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.54 vs. limit=12.0 2024-08-10 14:03:46,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=588320.0, ans=0.09899494936611666 2024-08-10 14:03:55,037 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2024-08-10 14:04:11,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=588420.0, ans=0.0 2024-08-10 14:04:42,031 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.20 vs. limit=15.0 2024-08-10 14:04:50,391 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 900, loss[loss=0.1199, beats_loss=0.009974, ecapa_loss=0.0002451, whisper_loss=0.1074, over 15385.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.0117, ecapa_loss=0.0002362, whisper_loss=0.09529, over 3758833.96 frames. ], batch size: 58, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:04:56,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=588720.0, ans=0.0 2024-08-10 14:04:57,681 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 14:05:13,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=588820.0, ans=0.07 2024-08-10 14:05:30,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=588920.0, ans=0.0 2024-08-10 14:05:32,077 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.732e+01 3.108e+01 3.625e+01 6.653e+01, threshold=6.216e+01, percent-clipped=1.0 2024-08-10 14:05:37,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=589020.0, ans=0.0 2024-08-10 14:05:50,024 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 10 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 14:05:54,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=589120.0, ans=0.1 2024-08-10 14:05:55,337 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 14:06:05,870 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 950, loss[loss=0.1262, beats_loss=0.009694, ecapa_loss=0.0002537, whisper_loss=0.114, over 16195.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.0117, ecapa_loss=0.0002346, whisper_loss=0.09482, over 3759453.15 frames. ], batch size: 65, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:06:15,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=589220.0, ans=0.125 2024-08-10 14:06:27,119 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 14:06:35,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=589420.0, ans=0.95 2024-08-10 14:06:42,719 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=22.5 2024-08-10 14:07:21,600 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1000, loss[loss=0.1119, beats_loss=0.0109, ecapa_loss=0.0002333, whisper_loss=0.09871, over 21750.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01175, ecapa_loss=0.0002347, whisper_loss=0.09473, over 3758782.62 frames. ], batch size: 82, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:07:25,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=589720.0, ans=0.1 2024-08-10 14:07:26,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=589720.0, ans=0.125 2024-08-10 14:07:29,300 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 14:07:29,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=589720.0, ans=0.5 2024-08-10 14:07:33,846 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 14:07:48,497 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2024-08-10 14:07:54,533 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.52 vs. limit=22.5 2024-08-10 14:07:57,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=589920.0, ans=0.0 2024-08-10 14:08:04,528 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.773e+01 3.202e+01 3.484e+01 8.284e+01, threshold=6.403e+01, percent-clipped=2.0 2024-08-10 14:08:05,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=589920.0, ans=0.09899494936611666 2024-08-10 14:08:33,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=590120.0, ans=0.125 2024-08-10 14:08:37,980 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1050, loss[loss=0.1137, beats_loss=0.01284, ecapa_loss=0.0001665, whisper_loss=0.09917, over 19458.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01178, ecapa_loss=0.0002327, whisper_loss=0.09511, over 3772838.49 frames. ], batch size: 73, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:08:44,670 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=12.0 2024-08-10 14:08:47,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=590220.0, ans=0.125 2024-08-10 14:09:04,997 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 14:09:17,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=590420.0, ans=0.2 2024-08-10 14:09:21,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=590420.0, ans=0.1 2024-08-10 14:09:29,435 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 14:09:34,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=590520.0, ans=0.0 2024-08-10 14:09:55,096 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1100, loss[loss=0.1214, beats_loss=0.009366, ecapa_loss=0.000229, whisper_loss=0.1098, over 16953.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01172, ecapa_loss=0.0002342, whisper_loss=0.09606, over 3794405.77 frames. ], batch size: 65, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:10:09,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=590720.0, ans=0.0 2024-08-10 14:10:21,736 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 11 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 14:10:24,829 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2024-08-10 14:10:29,550 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.55 vs. limit=10.0 2024-08-10 14:10:42,513 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.893e+01 3.257e+01 3.748e+01 6.503e+01, threshold=6.515e+01, percent-clipped=1.0 2024-08-10 14:10:46,013 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 14:10:59,335 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.40 vs. limit=22.5 2024-08-10 14:11:15,727 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1150, loss[loss=0.0833, beats_loss=0.01236, ecapa_loss=0.000232, whisper_loss=0.06862, over 16693.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01176, ecapa_loss=0.0002335, whisper_loss=0.09515, over 3775913.93 frames. ], batch size: 65, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:11:16,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=591220.0, ans=0.0 2024-08-10 14:11:23,086 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 14:11:44,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=591420.0, ans=0.125 2024-08-10 14:11:50,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=591420.0, ans=0.1 2024-08-10 14:12:05,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=591520.0, ans=10.0 2024-08-10 14:12:10,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=591520.0, ans=0.0 2024-08-10 14:12:18,253 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.71 vs. limit=15.0 2024-08-10 14:12:29,310 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1200, loss[loss=0.1267, beats_loss=0.01132, ecapa_loss=0.000223, whisper_loss=0.1132, over 22541.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01189, ecapa_loss=0.0002323, whisper_loss=0.09479, over 3800699.65 frames. ], batch size: 89, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:12:46,127 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 14:13:12,131 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+01 2.894e+01 3.360e+01 3.999e+01 6.251e+01, threshold=6.719e+01, percent-clipped=0.0 2024-08-10 14:13:29,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=592020.0, ans=0.04949747468305833 2024-08-10 14:13:39,259 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-10 14:13:41,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=592120.0, ans=0.1 2024-08-10 14:13:46,375 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1250, loss[loss=0.1164, beats_loss=0.01067, ecapa_loss=0.0002572, whisper_loss=0.1032, over 23937.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.0119, ecapa_loss=0.0002325, whisper_loss=0.09445, over 3808302.73 frames. ], batch size: 95, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:14:30,816 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.91 vs. limit=22.5 2024-08-10 14:14:34,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=592520.0, ans=0.1 2024-08-10 14:14:42,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=592520.0, ans=0.125 2024-08-10 14:14:53,902 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-10 14:15:00,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=592620.0, ans=0.125 2024-08-10 14:15:03,221 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1300, loss[loss=0.1206, beats_loss=0.0108, ecapa_loss=0.000232, whisper_loss=0.1075, over 23561.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01187, ecapa_loss=0.0002321, whisper_loss=0.0945, over 3824582.92 frames. ], batch size: 91, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:15:03,355 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 14:15:11,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=592720.0, ans=0.95 2024-08-10 14:15:38,236 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.28 vs. limit=15.0 2024-08-10 14:15:44,297 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 14:15:45,457 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 14:15:50,106 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.731e+01 3.070e+01 3.519e+01 6.243e+01, threshold=6.140e+01, percent-clipped=0.0 2024-08-10 14:16:18,265 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 14:16:24,069 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1350, loss[loss=0.1184, beats_loss=0.009732, ecapa_loss=0.0002873, whisper_loss=0.1058, over 22052.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01188, ecapa_loss=0.0002312, whisper_loss=0.09463, over 3824006.29 frames. ], batch size: 92, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:16:26,745 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.86 vs. limit=15.0 2024-08-10 14:16:50,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=593320.0, ans=0.0 2024-08-10 14:16:52,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=593320.0, ans=0.125 2024-08-10 14:17:02,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=593420.0, ans=0.0 2024-08-10 14:17:06,616 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.83 vs. limit=15.0 2024-08-10 14:17:19,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=593520.0, ans=0.0 2024-08-10 14:17:41,010 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 14:17:44,546 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1400, loss[loss=0.1207, beats_loss=0.01074, ecapa_loss=0.000223, whisper_loss=0.1078, over 20485.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01191, ecapa_loss=0.0002296, whisper_loss=0.0937, over 3805921.98 frames. ], batch size: 80, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:17:45,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=593720.0, ans=0.2 2024-08-10 14:17:50,977 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 14:17:54,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=593720.0, ans=0.125 2024-08-10 14:18:00,342 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-10 14:18:24,819 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2024-08-10 14:18:25,243 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.058e+01 2.747e+01 3.189e+01 3.732e+01 5.782e+01, threshold=6.377e+01, percent-clipped=0.0 2024-08-10 14:18:25,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=593920.0, ans=0.125 2024-08-10 14:18:54,091 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-10 14:18:56,614 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1450, loss[loss=0.079, beats_loss=0.01304, ecapa_loss=0.0001753, whisper_loss=0.0642, over 15798.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01186, ecapa_loss=0.0002296, whisper_loss=0.09376, over 3806034.55 frames. ], batch size: 61, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:19:22,585 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 14:19:24,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=594220.0, ans=0.0 2024-08-10 14:19:29,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=594220.0, ans=0.0 2024-08-10 14:19:38,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=594320.0, ans=0.2 2024-08-10 14:19:39,923 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.89 vs. limit=10.0 2024-08-10 14:19:56,634 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 14:20:01,708 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 14:20:07,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=15.0 2024-08-10 14:20:08,679 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2024-08-10 14:20:29,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=594620.0, ans=0.125 2024-08-10 14:20:36,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=594620.0, ans=0.0 2024-08-10 14:20:40,754 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1500, loss[loss=0.07373, beats_loss=0.01306, ecapa_loss=0.0001795, whisper_loss=0.05887, over 14526.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01187, ecapa_loss=0.0002295, whisper_loss=0.09345, over 3813516.73 frames. ], batch size: 59, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:20:42,711 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-10 14:20:44,300 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 14:20:47,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=594720.0, ans=0.1 2024-08-10 14:20:50,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=594720.0, ans=0.1 2024-08-10 14:21:08,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=594820.0, ans=0.0 2024-08-10 14:21:17,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=594920.0, ans=0.1 2024-08-10 14:21:24,556 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.686e+01 3.011e+01 3.504e+01 1.040e+02, threshold=6.023e+01, percent-clipped=2.0 2024-08-10 14:21:30,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=595020.0, ans=0.1 2024-08-10 14:21:51,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=595120.0, ans=0.125 2024-08-10 14:21:59,056 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1550, loss[loss=0.08998, beats_loss=0.01306, ecapa_loss=0.0002088, whisper_loss=0.07484, over 22940.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01187, ecapa_loss=0.0002289, whisper_loss=0.09353, over 3833716.94 frames. ], batch size: 92, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:22:07,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=595220.0, ans=0.125 2024-08-10 14:22:18,532 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 30 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 14:22:36,587 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-10 14:22:45,959 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 14:22:48,914 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 15 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-10 14:22:52,820 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.89 vs. limit=15.0 2024-08-10 14:23:07,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=595620.0, ans=0.125 2024-08-10 14:23:13,473 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-10 14:23:15,845 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1600, loss[loss=0.0946, beats_loss=0.01264, ecapa_loss=0.0001956, whisper_loss=0.08001, over 14244.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01187, ecapa_loss=0.0002302, whisper_loss=0.09425, over 3836889.16 frames. ], batch size: 54, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:23:21,050 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-10 14:23:22,547 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 14:23:32,296 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2024-08-10 14:23:34,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=595820.0, ans=0.04949747468305833 2024-08-10 14:23:43,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=595820.0, ans=0.125 2024-08-10 14:23:51,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=595920.0, ans=0.1 2024-08-10 14:23:59,559 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.802e+01 3.147e+01 3.611e+01 5.289e+01, threshold=6.294e+01, percent-clipped=0.0 2024-08-10 14:24:06,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=596020.0, ans=0.1 2024-08-10 14:24:16,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=596020.0, ans=0.125 2024-08-10 14:24:18,412 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 14:24:24,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=596120.0, ans=0.125 2024-08-10 14:24:37,085 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1650, loss[loss=0.1025, beats_loss=0.01473, ecapa_loss=0.0002028, whisper_loss=0.0857, over 22550.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01188, ecapa_loss=0.00023, whisper_loss=0.09499, over 3883300.13 frames. ], batch size: 91, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:25:00,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=596320.0, ans=0.2 2024-08-10 14:25:13,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=596420.0, ans=0.125 2024-08-10 14:25:32,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=596520.0, ans=0.125 2024-08-10 14:25:52,984 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1700, loss[loss=0.1059, beats_loss=0.01221, ecapa_loss=0.0002537, whisper_loss=0.09114, over 22098.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01189, ecapa_loss=0.0002296, whisper_loss=0.09494, over 3865322.96 frames. ], batch size: 91, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:25:58,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=596720.0, ans=0.2 2024-08-10 14:26:00,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=596720.0, ans=0.125 2024-08-10 14:26:21,970 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 14:26:34,979 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+01 2.744e+01 3.070e+01 3.564e+01 5.631e+01, threshold=6.139e+01, percent-clipped=0.0 2024-08-10 14:26:53,090 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 14:27:00,271 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.99 vs. limit=15.0 2024-08-10 14:27:05,322 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.58 vs. limit=15.0 2024-08-10 14:27:07,576 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 14:27:08,787 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1750, loss[loss=0.1091, beats_loss=0.009293, ecapa_loss=0.0002699, whisper_loss=0.0971, over 15217.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.0118, ecapa_loss=0.0002321, whisper_loss=0.09409, over 3858837.17 frames. ], batch size: 62, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:27:21,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=597220.0, ans=0.0 2024-08-10 14:27:34,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=597320.0, ans=0.125 2024-08-10 14:28:06,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=597520.0, ans=0.025 2024-08-10 14:28:21,515 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-08-10 14:28:26,702 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1800, loss[loss=0.1075, beats_loss=0.01359, ecapa_loss=0.0002153, whisper_loss=0.09173, over 19755.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01177, ecapa_loss=0.000232, whisper_loss=0.09436, over 3877914.88 frames. ], batch size: 75, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:28:27,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=597720.0, ans=0.04949747468305833 2024-08-10 14:28:53,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=597820.0, ans=0.0 2024-08-10 14:28:54,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=597920.0, ans=0.02 2024-08-10 14:29:07,576 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.774e+01 3.082e+01 3.729e+01 4.718e+01, threshold=6.165e+01, percent-clipped=0.0 2024-08-10 14:29:09,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=598020.0, ans=0.125 2024-08-10 14:29:27,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=598120.0, ans=0.2 2024-08-10 14:29:37,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=598120.0, ans=0.0 2024-08-10 14:29:40,248 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1850, loss[loss=0.1111, beats_loss=0.01083, ecapa_loss=0.0002187, whisper_loss=0.09808, over 15539.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01184, ecapa_loss=0.000233, whisper_loss=0.09364, over 3838089.33 frames. ], batch size: 58, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:29:51,447 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.405e-01 2024-08-10 14:30:08,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=598320.0, ans=0.0 2024-08-10 14:30:22,545 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.90 vs. limit=22.5 2024-08-10 14:30:36,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=598520.0, ans=0.1 2024-08-10 14:30:40,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.69 vs. limit=10.0 2024-08-10 14:30:46,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=598620.0, ans=0.04949747468305833 2024-08-10 14:30:52,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=598620.0, ans=0.125 2024-08-10 14:30:57,512 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1900, loss[loss=0.1029, beats_loss=0.01074, ecapa_loss=0.0002534, whisper_loss=0.08959, over 17431.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01183, ecapa_loss=0.0002358, whisper_loss=0.09357, over 3806823.10 frames. ], batch size: 68, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:31:07,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=598720.0, ans=0.0 2024-08-10 14:31:10,008 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 29 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-10 14:31:26,620 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 14:31:27,354 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=12.0 2024-08-10 14:31:41,080 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.853e+01 3.252e+01 3.827e+01 6.548e+01, threshold=6.504e+01, percent-clipped=1.0 2024-08-10 14:31:41,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=598920.0, ans=0.1 2024-08-10 14:31:50,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=599020.0, ans=0.015 2024-08-10 14:32:11,133 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 14:32:14,413 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 1950, loss[loss=0.108, beats_loss=0.01107, ecapa_loss=0.0002333, whisper_loss=0.09458, over 17709.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01182, ecapa_loss=0.0002373, whisper_loss=0.09381, over 3812786.54 frames. ], batch size: 65, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:32:19,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=599220.0, ans=0.2 2024-08-10 14:32:22,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=599220.0, ans=0.125 2024-08-10 14:32:26,641 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 14:32:39,362 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2024-08-10 14:32:52,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=599420.0, ans=0.1 2024-08-10 14:33:13,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=599620.0, ans=0.2 2024-08-10 14:33:24,016 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.30 vs. limit=22.5 2024-08-10 14:33:24,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=599620.0, ans=0.0 2024-08-10 14:33:30,112 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2000, loss[loss=0.1158, beats_loss=0.008704, ecapa_loss=0.0002953, whisper_loss=0.1042, over 17898.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01185, ecapa_loss=0.0002396, whisper_loss=0.09377, over 3829781.01 frames. ], batch size: 71, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:33:40,242 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.88 vs. limit=22.5 2024-08-10 14:33:45,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=599820.0, ans=0.125 2024-08-10 14:34:08,830 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 14:34:16,214 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.787e+01 3.156e+01 3.560e+01 5.120e+01, threshold=6.313e+01, percent-clipped=0.0 2024-08-10 14:34:18,915 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 14:34:19,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=600020.0, ans=0.125 2024-08-10 14:34:21,055 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=16.04 vs. limit=15.0 2024-08-10 14:34:34,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=600120.0, ans=0.2 2024-08-10 14:34:34,791 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2024-08-10 14:34:37,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=600120.0, ans=0.125 2024-08-10 14:34:50,050 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2050, loss[loss=0.09364, beats_loss=0.01423, ecapa_loss=0.0002219, whisper_loss=0.07719, over 21554.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01188, ecapa_loss=0.0002405, whisper_loss=0.09381, over 3828835.88 frames. ], batch size: 89, lr: 1.29e-02, grad_scale: 34359738368.0 2024-08-10 14:35:01,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=600220.0, ans=0.125 2024-08-10 14:35:01,426 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2024-08-10 14:35:09,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=600320.0, ans=0.125 2024-08-10 14:35:23,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=600420.0, ans=0.125 2024-08-10 14:35:30,131 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-10 14:35:31,615 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 14:35:36,235 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 14:35:39,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=600520.0, ans=0.125 2024-08-10 14:35:50,627 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 14:35:55,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=600620.0, ans=0.1 2024-08-10 14:36:04,722 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2100, loss[loss=0.1196, beats_loss=0.0133, ecapa_loss=0.0001935, whisper_loss=0.1044, over 23415.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01193, ecapa_loss=0.0002401, whisper_loss=0.09337, over 3808417.97 frames. ], batch size: 92, lr: 1.29e-02, grad_scale: 34359738368.0 2024-08-10 14:36:09,970 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.073e-02 2024-08-10 14:36:24,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=600820.0, ans=0.125 2024-08-10 14:36:29,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=600820.0, ans=0.0 2024-08-10 14:36:46,895 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.750e+01 3.110e+01 3.646e+01 5.998e+01, threshold=6.220e+01, percent-clipped=0.0 2024-08-10 14:36:47,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=600920.0, ans=0.125 2024-08-10 14:37:19,336 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2150, loss[loss=0.1159, beats_loss=0.01148, ecapa_loss=0.0002698, whisper_loss=0.1017, over 21748.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01194, ecapa_loss=0.0002409, whisper_loss=0.09388, over 3827637.90 frames. ], batch size: 87, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:37:22,672 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-10 14:37:25,502 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.23 vs. limit=12.0 2024-08-10 14:37:28,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=601220.0, ans=0.1 2024-08-10 14:37:32,234 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 27 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 14:37:35,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=601320.0, ans=0.125 2024-08-10 14:37:40,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=601320.0, ans=0.125 2024-08-10 14:37:40,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=601320.0, ans=0.0 2024-08-10 14:37:45,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=601320.0, ans=0.125 2024-08-10 14:37:45,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=601320.0, ans=0.0 2024-08-10 14:37:48,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=601320.0, ans=0.0 2024-08-10 14:37:48,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=601320.0, ans=0.125 2024-08-10 14:38:35,961 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2200, loss[loss=0.1171, beats_loss=0.01012, ecapa_loss=0.0002172, whisper_loss=0.1048, over 20779.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01196, ecapa_loss=0.0002409, whisper_loss=0.09421, over 3816822.97 frames. ], batch size: 77, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:38:37,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=601720.0, ans=0.07 2024-08-10 14:38:39,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=601720.0, ans=0.04949747468305833 2024-08-10 14:38:39,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=601720.0, ans=0.0 2024-08-10 14:38:39,775 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=19.62 vs. limit=15.0 2024-08-10 14:39:04,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=601920.0, ans=0.1 2024-08-10 14:39:11,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.33 vs. limit=12.0 2024-08-10 14:39:12,022 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 14:39:14,460 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.850e+01 3.154e+01 3.768e+01 5.598e+01, threshold=6.309e+01, percent-clipped=0.0 2024-08-10 14:39:16,473 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.39 vs. limit=12.0 2024-08-10 14:39:20,983 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 15 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 14:39:38,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=602120.0, ans=0.125 2024-08-10 14:39:42,988 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2250, loss[loss=0.08771, beats_loss=0.01356, ecapa_loss=0.0002656, whisper_loss=0.0715, over 21338.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01202, ecapa_loss=0.0002394, whisper_loss=0.09443, over 3831863.15 frames. ], batch size: 92, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:39:45,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=602220.0, ans=0.125 2024-08-10 14:39:48,274 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 14:39:51,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=602220.0, ans=0.125 2024-08-10 14:39:57,929 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.95 vs. limit=10.0 2024-08-10 14:40:10,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=602420.0, ans=0.1 2024-08-10 14:40:14,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=602420.0, ans=0.5 2024-08-10 14:40:15,731 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2024-08-10 14:40:28,089 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=15.0 2024-08-10 14:40:47,068 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2300, loss[loss=0.107, beats_loss=0.01253, ecapa_loss=0.0002301, whisper_loss=0.09216, over 18332.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01199, ecapa_loss=0.000238, whisper_loss=0.09533, over 3857895.45 frames. ], batch size: 73, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:41:01,241 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 14:41:03,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=602820.0, ans=0.2 2024-08-10 14:41:23,190 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.867e+01 3.175e+01 3.741e+01 6.464e+01, threshold=6.350e+01, percent-clipped=1.0 2024-08-10 14:41:23,430 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 14:41:44,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=603120.0, ans=0.125 2024-08-10 14:41:51,344 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2350, loss[loss=0.09765, beats_loss=0.0128, ecapa_loss=0.0002603, whisper_loss=0.08225, over 21720.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01193, ecapa_loss=0.0002401, whisper_loss=0.09484, over 3841992.75 frames. ], batch size: 90, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:41:59,164 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 14:42:04,609 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 14:42:06,938 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 14:42:09,818 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.41 vs. limit=22.5 2024-08-10 14:42:21,103 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 14:42:31,119 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 23 from LS+wenet, 22 from Vox, 50 fro AS 2024-08-10 14:42:35,632 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.85 vs. limit=22.5 2024-08-10 14:42:39,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=603520.0, ans=0.125 2024-08-10 14:42:44,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=603620.0, ans=0.1 2024-08-10 14:42:55,238 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2400, loss[loss=0.1188, beats_loss=0.01352, ecapa_loss=0.0001782, whisper_loss=0.1035, over 24738.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01189, ecapa_loss=0.0002408, whisper_loss=0.09505, over 3849671.62 frames. ], batch size: 93, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:42:55,369 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 14:42:59,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=603720.0, ans=0.0 2024-08-10 14:43:02,683 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2024-08-10 14:43:14,828 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 14:43:21,669 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 14:43:28,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=603920.0, ans=0.1 2024-08-10 14:43:31,555 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.78 vs. limit=22.5 2024-08-10 14:43:31,812 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.726e+01 3.127e+01 3.676e+01 5.177e+01, threshold=6.255e+01, percent-clipped=0.0 2024-08-10 14:43:31,957 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 14:43:49,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=604120.0, ans=15.0 2024-08-10 14:43:56,720 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-10 14:44:00,606 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2450, loss[loss=0.09467, beats_loss=0.01162, ecapa_loss=0.0002486, whisper_loss=0.08056, over 13537.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01196, ecapa_loss=0.0002404, whisper_loss=0.094, over 3838992.04 frames. ], batch size: 54, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:44:01,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=604220.0, ans=0.2 2024-08-10 14:44:09,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=604220.0, ans=0.1 2024-08-10 14:44:10,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=604220.0, ans=0.125 2024-08-10 14:44:19,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=604320.0, ans=0.125 2024-08-10 14:44:24,788 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.23 vs. limit=10.0 2024-08-10 14:44:29,383 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 14:44:30,063 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2024-08-10 14:44:36,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=604420.0, ans=0.125 2024-08-10 14:44:40,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=604520.0, ans=0.125 2024-08-10 14:44:41,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=604520.0, ans=0.0 2024-08-10 14:44:48,921 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-10 14:44:49,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=604520.0, ans=0.0 2024-08-10 14:44:49,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=604520.0, ans=0.0 2024-08-10 14:44:50,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=604520.0, ans=0.2 2024-08-10 14:45:05,748 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2500, loss[loss=0.08725, beats_loss=0.0171, ecapa_loss=0.0002057, whisper_loss=0.06809, over 22801.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01187, ecapa_loss=0.0002417, whisper_loss=0.09442, over 3856848.72 frames. ], batch size: 94, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:45:07,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2024-08-10 14:45:08,656 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 14:45:17,613 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 14:45:30,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=604820.0, ans=0.125 2024-08-10 14:45:31,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=604920.0, ans=0.0 2024-08-10 14:45:39,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=604920.0, ans=0.015 2024-08-10 14:45:41,110 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 14:45:42,124 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.835e+01 3.123e+01 3.643e+01 5.985e+01, threshold=6.245e+01, percent-clipped=0.0 2024-08-10 14:45:42,324 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 35 from Vox, 29 fro AS 2024-08-10 14:45:44,598 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.87 vs. limit=15.0 2024-08-10 14:45:46,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=605020.0, ans=0.0 2024-08-10 14:45:51,066 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.68 vs. limit=10.0 2024-08-10 14:45:55,433 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 14:46:05,818 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 15 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-10 14:46:11,032 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2550, loss[loss=0.0956, beats_loss=0.01344, ecapa_loss=0.0001653, whisper_loss=0.0805, over 14475.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01177, ecapa_loss=0.000242, whisper_loss=0.09527, over 3851870.63 frames. ], batch size: 55, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:46:14,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=605220.0, ans=15.0 2024-08-10 14:46:19,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=605220.0, ans=0.2 2024-08-10 14:46:43,200 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 14:47:06,033 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.90 vs. limit=15.0 2024-08-10 14:47:13,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=605620.0, ans=0.09899494936611666 2024-08-10 14:47:15,652 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2600, loss[loss=0.1278, beats_loss=0.01106, ecapa_loss=0.0002935, whisper_loss=0.1138, over 21826.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01178, ecapa_loss=0.0002411, whisper_loss=0.09545, over 3849431.54 frames. ], batch size: 88, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:47:25,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-08-10 14:47:41,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=605920.0, ans=0.125 2024-08-10 14:47:51,316 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.79 vs. limit=15.0 2024-08-10 14:47:51,774 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+01 2.738e+01 3.065e+01 3.602e+01 6.052e+01, threshold=6.131e+01, percent-clipped=0.0 2024-08-10 14:48:02,763 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 20 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-10 14:48:10,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=606120.0, ans=0.2 2024-08-10 14:48:15,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=606120.0, ans=0.125 2024-08-10 14:48:20,611 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2650, loss[loss=0.1193, beats_loss=0.01133, ecapa_loss=0.0002238, whisper_loss=0.1057, over 15285.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01183, ecapa_loss=0.0002425, whisper_loss=0.0951, over 3874096.22 frames. ], batch size: 56, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:48:24,754 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-10 14:48:27,306 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 14:48:27,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=606220.0, ans=0.125 2024-08-10 14:48:28,213 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.78 vs. limit=15.0 2024-08-10 14:48:32,672 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 14:48:34,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=606320.0, ans=0.0 2024-08-10 14:48:39,040 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-10 14:48:54,728 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.18 vs. limit=22.5 2024-08-10 14:49:02,331 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 14:49:07,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=606520.0, ans=0.125 2024-08-10 14:49:07,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=606520.0, ans=0.025 2024-08-10 14:49:07,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=606520.0, ans=0.0 2024-08-10 14:49:15,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=606620.0, ans=0.0 2024-08-10 14:49:18,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=606620.0, ans=0.1 2024-08-10 14:49:20,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=606620.0, ans=0.125 2024-08-10 14:49:25,687 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2700, loss[loss=0.1057, beats_loss=0.01213, ecapa_loss=0.0002435, whisper_loss=0.09116, over 21662.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.0119, ecapa_loss=0.0002433, whisper_loss=0.09501, over 3876920.73 frames. ], batch size: 87, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:49:27,236 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 15 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-10 14:49:30,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=606720.0, ans=0.125 2024-08-10 14:49:38,181 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.85 vs. limit=15.0 2024-08-10 14:49:46,265 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2024-08-10 14:49:59,814 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 14:50:02,245 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 3.024e+01 3.379e+01 4.188e+01 8.555e+01, threshold=6.757e+01, percent-clipped=2.0 2024-08-10 14:50:30,989 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2750, loss[loss=0.1323, beats_loss=0.00767, ecapa_loss=0.0002845, whisper_loss=0.1218, over 15489.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01194, ecapa_loss=0.0002443, whisper_loss=0.09443, over 3861472.48 frames. ], batch size: 59, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:50:34,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=607220.0, ans=0.0 2024-08-10 14:50:48,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=607320.0, ans=0.125 2024-08-10 14:51:15,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=607520.0, ans=0.2 2024-08-10 14:51:16,186 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 33 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 14:51:37,014 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2800, loss[loss=0.1098, beats_loss=0.01098, ecapa_loss=0.0002415, whisper_loss=0.09642, over 16086.00 frames. ], tot_loss[loss=0.109, beats_loss=0.0119, ecapa_loss=0.0002422, whisper_loss=0.09469, over 3847310.02 frames. ], batch size: 62, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:51:38,522 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 14:51:47,193 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2024-08-10 14:52:10,180 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-10 14:52:14,105 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.158e+01 2.765e+01 3.202e+01 3.631e+01 5.642e+01, threshold=6.403e+01, percent-clipped=0.0 2024-08-10 14:52:40,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=608120.0, ans=0.2 2024-08-10 14:52:41,501 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 14:52:42,562 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2850, loss[loss=0.1188, beats_loss=0.0117, ecapa_loss=0.0002238, whisper_loss=0.1048, over 23286.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01188, ecapa_loss=0.000242, whisper_loss=0.09506, over 3841885.05 frames. ], batch size: 90, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:52:51,245 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2024-08-10 14:52:52,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=608220.0, ans=0.125 2024-08-10 14:52:58,777 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 14:53:13,037 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 14:53:16,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=608420.0, ans=0.07 2024-08-10 14:53:33,614 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 14:53:39,126 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.884e-03 2024-08-10 14:53:45,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=608620.0, ans=0.125 2024-08-10 14:53:47,894 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2900, loss[loss=0.09619, beats_loss=0.01308, ecapa_loss=0.0002769, whisper_loss=0.08034, over 17486.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01189, ecapa_loss=0.0002447, whisper_loss=0.0952, over 3849069.61 frames. ], batch size: 72, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:53:54,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=608720.0, ans=0.125 2024-08-10 14:53:57,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=608720.0, ans=0.125 2024-08-10 14:54:00,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=608820.0, ans=0.125 2024-08-10 14:54:20,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=608920.0, ans=0.125 2024-08-10 14:54:23,183 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2024-08-10 14:54:24,849 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.058e+01 2.824e+01 3.286e+01 3.731e+01 5.146e+01, threshold=6.573e+01, percent-clipped=0.0 2024-08-10 14:54:36,287 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.54 vs. limit=15.0 2024-08-10 14:54:53,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=609220.0, ans=0.0 2024-08-10 14:54:53,794 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 2950, loss[loss=0.1175, beats_loss=0.01013, ecapa_loss=0.0002433, whisper_loss=0.1049, over 21343.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01195, ecapa_loss=0.0002443, whisper_loss=0.09455, over 3857517.29 frames. ], batch size: 85, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:54:58,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=609220.0, ans=0.1 2024-08-10 14:55:08,269 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 14:55:25,061 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 14:55:25,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=609420.0, ans=0.1 2024-08-10 14:55:37,903 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-10 14:55:43,129 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 32 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 14:55:57,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=609720.0, ans=0.0 2024-08-10 14:55:58,277 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3000, loss[loss=0.1085, beats_loss=0.01548, ecapa_loss=0.0002055, whisper_loss=0.09095, over 23242.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01202, ecapa_loss=0.0002423, whisper_loss=0.09486, over 3880644.52 frames. ], batch size: 94, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:55:58,277 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-10 14:56:35,264 INFO [train_multi_KD3.py:1149] (2/4) Epoch 5, validation on ASR_libri: loss=0.2643, beats_loss=0, ecapa_loss=0.0007548, whisper_loss=0.2568, over 922467.00 frames. 2024-08-10 14:56:41,566 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.9206, 1.9227, 1.6631, 2.2957], device='cuda:2') 2024-08-10 14:56:45,475 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([0.0181, 0.0504, 0.0069, 3.4134, 0.0167, 0.0834, 0.0460, 0.0695], device='cuda:2') 2024-08-10 14:56:52,865 INFO [train_multi_KD3.py:1149] (2/4) Epoch 5, validation on SV_voxceleb1: loss=0.006405, beats_loss=0, ecapa_loss=0.0006405, whisper_loss=0, over 939242.00 frames. 2024-08-10 14:58:43,697 INFO [train_multi_KD3.py:1149] (2/4) Epoch 5, validation on AT_audioset: loss=0.02683, beats_loss=0.02683, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 14:58:43,707 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 14:58:51,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=609720.0, ans=0.0 2024-08-10 14:58:53,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=609720.0, ans=0.2 2024-08-10 14:58:57,798 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 14:58:58,421 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-08-10 14:59:10,348 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-08-10 14:59:19,872 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 3.032e+01 3.491e+01 3.911e+01 5.761e+01, threshold=6.982e+01, percent-clipped=0.0 2024-08-10 14:59:33,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=610020.0, ans=0.025 2024-08-10 14:59:39,092 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2024-08-10 14:59:48,801 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3050, loss[loss=0.1014, beats_loss=0.01407, ecapa_loss=0.0002121, whisper_loss=0.08519, over 22851.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01191, ecapa_loss=0.0002422, whisper_loss=0.09548, over 3891683.06 frames. ], batch size: 94, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:59:55,171 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 12 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 15:00:00,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=610320.0, ans=0.125 2024-08-10 15:00:06,557 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=22.5 2024-08-10 15:00:16,572 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.68 vs. limit=6.0 2024-08-10 15:00:33,194 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 15:00:54,626 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3100, loss[loss=0.106, beats_loss=0.01169, ecapa_loss=0.0002403, whisper_loss=0.09194, over 15808.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01195, ecapa_loss=0.0002416, whisper_loss=0.09546, over 3912397.84 frames. ], batch size: 62, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:01:00,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=610720.0, ans=0.125 2024-08-10 15:01:33,144 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.743e+01 3.024e+01 3.559e+01 5.609e+01, threshold=6.048e+01, percent-clipped=0.0 2024-08-10 15:01:44,860 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2024-08-10 15:01:46,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=611020.0, ans=0.1 2024-08-10 15:01:49,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=611120.0, ans=0.0 2024-08-10 15:01:59,190 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 15:02:03,409 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3150, loss[loss=0.1117, beats_loss=0.01244, ecapa_loss=0.0002325, whisper_loss=0.09696, over 22720.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01192, ecapa_loss=0.0002426, whisper_loss=0.09529, over 3881982.30 frames. ], batch size: 88, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:02:06,073 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.39 vs. limit=10.0 2024-08-10 15:02:09,840 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.86 vs. limit=15.0 2024-08-10 15:02:12,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=611220.0, ans=0.1 2024-08-10 15:02:24,028 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-10 15:02:44,387 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=8.713e-03 2024-08-10 15:02:58,697 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-10 15:03:01,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=611620.0, ans=0.125 2024-08-10 15:03:13,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=611620.0, ans=0.0 2024-08-10 15:03:14,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=611720.0, ans=0.125 2024-08-10 15:03:15,472 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3200, loss[loss=0.137, beats_loss=0.01123, ecapa_loss=0.0002118, whisper_loss=0.1237, over 22133.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01189, ecapa_loss=0.0002421, whisper_loss=0.09589, over 3888209.10 frames. ], batch size: 87, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:03:23,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=611720.0, ans=0.125 2024-08-10 15:03:39,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=611820.0, ans=0.125 2024-08-10 15:03:43,885 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 15:03:52,328 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 33 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 15:03:53,653 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 30 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-10 15:03:56,626 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.779e+01 3.150e+01 3.545e+01 6.901e+01, threshold=6.301e+01, percent-clipped=2.0 2024-08-10 15:04:21,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=612120.0, ans=0.125 2024-08-10 15:04:27,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=612220.0, ans=0.1 2024-08-10 15:04:28,365 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3250, loss[loss=0.1158, beats_loss=0.01169, ecapa_loss=0.0002857, whisper_loss=0.1012, over 22162.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.0118, ecapa_loss=0.0002446, whisper_loss=0.09668, over 3889675.87 frames. ], batch size: 92, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:04:33,563 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2024-08-10 15:04:38,526 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 15:04:40,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=612220.0, ans=0.1 2024-08-10 15:04:50,129 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 32 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 15:04:54,993 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.37 vs. limit=15.0 2024-08-10 15:05:11,991 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.76 vs. limit=15.0 2024-08-10 15:05:16,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=612520.0, ans=0.0 2024-08-10 15:05:24,914 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 15:05:29,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=612620.0, ans=0.125 2024-08-10 15:05:31,723 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 15:05:40,801 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3300, loss[loss=0.103, beats_loss=0.01132, ecapa_loss=0.0002215, whisper_loss=0.08943, over 22312.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01175, ecapa_loss=0.0002451, whisper_loss=0.09674, over 3878609.99 frames. ], batch size: 89, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:05:40,932 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 15:05:41,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=612720.0, ans=0.1 2024-08-10 15:05:54,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=612820.0, ans=0.125 2024-08-10 15:05:59,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=612820.0, ans=0.125 2024-08-10 15:06:00,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=612820.0, ans=0.1 2024-08-10 15:06:12,657 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-08-10 15:06:15,258 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-10 15:06:21,126 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-10 15:06:22,299 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.755e+01 3.072e+01 3.647e+01 1.345e+02, threshold=6.143e+01, percent-clipped=1.0 2024-08-10 15:06:23,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=613020.0, ans=0.125 2024-08-10 15:06:37,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=613020.0, ans=0.2 2024-08-10 15:06:42,220 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-08-10 15:06:44,302 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 15:06:54,626 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3350, loss[loss=0.116, beats_loss=0.01177, ecapa_loss=0.0002478, whisper_loss=0.1017, over 17010.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01175, ecapa_loss=0.0002452, whisper_loss=0.0968, over 3871663.60 frames. ], batch size: 69, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:06:55,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=613220.0, ans=0.0 2024-08-10 15:06:57,932 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 15:07:04,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=613220.0, ans=0.0 2024-08-10 15:07:14,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=613320.0, ans=0.125 2024-08-10 15:07:32,036 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 15:07:41,983 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 15:08:05,160 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 15:08:08,158 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3400, loss[loss=0.1204, beats_loss=0.0118, ecapa_loss=0.0001777, whisper_loss=0.1069, over 23909.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01178, ecapa_loss=0.0002437, whisper_loss=0.09694, over 3902820.23 frames. ], batch size: 90, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:08:21,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=613820.0, ans=0.1 2024-08-10 15:08:36,981 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 15:08:44,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=613920.0, ans=0.1 2024-08-10 15:08:49,553 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 2.884e+01 3.210e+01 3.796e+01 7.234e+01, threshold=6.419e+01, percent-clipped=1.0 2024-08-10 15:08:51,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=614020.0, ans=0.04949747468305833 2024-08-10 15:08:52,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=614020.0, ans=0.0 2024-08-10 15:08:55,824 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.47 vs. limit=15.0 2024-08-10 15:09:02,754 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 15:09:04,115 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 15:09:21,329 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3450, loss[loss=0.1143, beats_loss=0.01295, ecapa_loss=0.0002443, whisper_loss=0.09894, over 22574.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01174, ecapa_loss=0.0002442, whisper_loss=0.09711, over 3924084.46 frames. ], batch size: 94, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:09:31,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=614220.0, ans=0.09899494936611666 2024-08-10 15:09:35,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=614320.0, ans=0.125 2024-08-10 15:09:54,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=614420.0, ans=0.1 2024-08-10 15:09:54,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=614420.0, ans=0.1 2024-08-10 15:09:55,603 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 15:10:08,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=614520.0, ans=0.125 2024-08-10 15:10:34,014 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3500, loss[loss=0.1345, beats_loss=0.01033, ecapa_loss=0.0002784, whisper_loss=0.1214, over 23649.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01174, ecapa_loss=0.0002451, whisper_loss=0.09706, over 3907699.38 frames. ], batch size: 92, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:10:55,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=614820.0, ans=0.1 2024-08-10 15:11:05,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=614920.0, ans=0.0 2024-08-10 15:11:15,254 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.335e+01 2.754e+01 3.128e+01 3.525e+01 7.630e+01, threshold=6.256e+01, percent-clipped=1.0 2024-08-10 15:11:27,029 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:11:43,202 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 25 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-10 15:11:46,899 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3550, loss[loss=0.1116, beats_loss=0.01155, ecapa_loss=0.0002542, whisper_loss=0.09755, over 22593.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01182, ecapa_loss=0.000244, whisper_loss=0.09616, over 3928257.68 frames. ], batch size: 90, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:12:20,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=615420.0, ans=0.125 2024-08-10 15:12:20,911 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 15:12:58,948 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3600, loss[loss=0.1076, beats_loss=0.01309, ecapa_loss=0.0002663, whisper_loss=0.09185, over 21758.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01183, ecapa_loss=0.0002416, whisper_loss=0.09671, over 3942565.99 frames. ], batch size: 90, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:12:59,536 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-08-10 15:13:16,282 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=12.0 2024-08-10 15:13:39,894 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.056e+01 2.884e+01 3.216e+01 3.547e+01 5.586e+01, threshold=6.432e+01, percent-clipped=0.0 2024-08-10 15:13:40,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=615920.0, ans=0.125 2024-08-10 15:13:53,263 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 15:13:57,716 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 15:13:58,859 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 15:14:11,295 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3650, loss[loss=0.1306, beats_loss=0.009412, ecapa_loss=0.000296, whisper_loss=0.1183, over 14449.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01186, ecapa_loss=0.0002428, whisper_loss=0.09604, over 3902911.54 frames. ], batch size: 59, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:14:48,998 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.70 vs. limit=10.0 2024-08-10 15:15:23,216 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3700, loss[loss=0.1226, beats_loss=0.009324, ecapa_loss=0.0002735, whisper_loss=0.1105, over 19382.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01183, ecapa_loss=0.0002445, whisper_loss=0.09628, over 3895088.17 frames. ], batch size: 79, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:15:45,447 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 15:15:57,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=616920.0, ans=0.07 2024-08-10 15:16:04,533 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.70 vs. limit=22.5 2024-08-10 15:16:05,218 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.736e+01 3.079e+01 3.558e+01 5.544e+01, threshold=6.157e+01, percent-clipped=0.0 2024-08-10 15:16:17,648 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2024-08-10 15:16:20,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=617020.0, ans=0.125 2024-08-10 15:16:27,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=617120.0, ans=0.125 2024-08-10 15:16:37,468 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3750, loss[loss=0.1017, beats_loss=0.01268, ecapa_loss=0.0002845, whisper_loss=0.08616, over 21769.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01184, ecapa_loss=0.0002448, whisper_loss=0.09628, over 3897349.67 frames. ], batch size: 90, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:16:48,567 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 7 from Vox, 31 fro AS 2024-08-10 15:16:53,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=617320.0, ans=0.07 2024-08-10 15:17:07,252 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 15:17:12,815 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 33 from Vox, 38 fro AS 2024-08-10 15:17:34,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=617620.0, ans=0.125 2024-08-10 15:17:39,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=617620.0, ans=0.1 2024-08-10 15:17:47,410 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.58 vs. limit=22.5 2024-08-10 15:17:49,461 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3800, loss[loss=0.125, beats_loss=0.01175, ecapa_loss=0.0002304, whisper_loss=0.1109, over 16696.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01193, ecapa_loss=0.0002446, whisper_loss=0.09576, over 3907543.60 frames. ], batch size: 68, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:17:55,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=617720.0, ans=0.0 2024-08-10 15:18:05,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=617820.0, ans=0.0 2024-08-10 15:18:07,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=617820.0, ans=0.125 2024-08-10 15:18:17,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=617920.0, ans=0.125 2024-08-10 15:18:30,986 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.844e+01 3.115e+01 3.732e+01 5.922e+01, threshold=6.230e+01, percent-clipped=0.0 2024-08-10 15:18:39,922 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 17 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 15:18:45,824 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 25 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-10 15:19:00,479 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 15:19:02,652 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3850, loss[loss=0.08601, beats_loss=0.01509, ecapa_loss=0.000174, whisper_loss=0.06918, over 16889.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01178, ecapa_loss=0.0002468, whisper_loss=0.09615, over 3880715.99 frames. ], batch size: 65, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:19:16,619 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-10 15:19:24,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=618320.0, ans=0.125 2024-08-10 15:19:39,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=618420.0, ans=0.125 2024-08-10 15:19:48,461 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 15:19:57,757 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-10 15:20:02,466 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 15:20:08,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=618520.0, ans=0.2 2024-08-10 15:20:24,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=618620.0, ans=0.2 2024-08-10 15:20:31,599 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3900, loss[loss=0.1127, beats_loss=0.01235, ecapa_loss=0.0002067, whisper_loss=0.09828, over 17143.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01176, ecapa_loss=0.0002484, whisper_loss=0.09645, over 3867533.03 frames. ], batch size: 68, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:20:51,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=618820.0, ans=0.1 2024-08-10 15:21:19,473 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 15:21:23,184 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:21:24,215 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.513e+01 3.061e+01 3.504e+01 4.098e+01 1.751e+02, threshold=7.008e+01, percent-clipped=3.0 2024-08-10 15:22:09,680 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.60 vs. limit=15.0 2024-08-10 15:22:11,704 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 3950, loss[loss=0.09953, beats_loss=0.01335, ecapa_loss=0.0002246, whisper_loss=0.08393, over 22454.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01181, ecapa_loss=0.0002489, whisper_loss=0.0965, over 3871950.90 frames. ], batch size: 89, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:22:12,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=619220.0, ans=0.1 2024-08-10 15:22:22,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=619220.0, ans=0.0 2024-08-10 15:22:24,293 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 15:22:27,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=619220.0, ans=0.0 2024-08-10 15:22:36,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=619320.0, ans=0.125 2024-08-10 15:22:59,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=619420.0, ans=0.125 2024-08-10 15:23:01,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.83 vs. limit=15.0 2024-08-10 15:23:22,672 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 11 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 15:23:51,742 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 15:23:54,590 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4000, loss[loss=0.1262, beats_loss=0.009564, ecapa_loss=0.0002238, whisper_loss=0.1144, over 15943.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01177, ecapa_loss=0.0002477, whisper_loss=0.09705, over 3864162.24 frames. ], batch size: 59, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:24:05,942 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 15:24:08,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=619720.0, ans=0.07 2024-08-10 15:24:19,035 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 15:24:28,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=619820.0, ans=0.125 2024-08-10 15:24:47,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=619920.0, ans=0.0 2024-08-10 15:24:47,294 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.11 vs. limit=22.5 2024-08-10 15:24:47,945 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 11 from Vox, 40 fro AS 2024-08-10 15:25:02,572 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+01 2.861e+01 3.318e+01 3.884e+01 5.554e+01, threshold=6.636e+01, percent-clipped=0.0 2024-08-10 15:25:15,101 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 15:25:26,919 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.52 vs. limit=22.5 2024-08-10 15:25:28,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=620120.0, ans=0.125 2024-08-10 15:25:35,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=620120.0, ans=0.125 2024-08-10 15:25:48,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=620120.0, ans=0.5 2024-08-10 15:25:52,353 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4050, loss[loss=0.0988, beats_loss=0.01366, ecapa_loss=0.0002495, whisper_loss=0.08265, over 20151.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01181, ecapa_loss=0.0002483, whisper_loss=0.09685, over 3866663.67 frames. ], batch size: 85, lr: 1.27e-02, grad_scale: 68719476736.0 2024-08-10 15:26:37,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=620320.0, ans=0.0 2024-08-10 15:26:45,685 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 20 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-10 15:26:48,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=620420.0, ans=0.125 2024-08-10 15:27:10,916 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-08-10 15:27:22,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=620520.0, ans=0.2 2024-08-10 15:27:24,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=620520.0, ans=0.05 2024-08-10 15:27:40,759 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=15.0 2024-08-10 15:27:50,465 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4100, loss[loss=0.1323, beats_loss=0.009506, ecapa_loss=0.0002374, whisper_loss=0.1204, over 21846.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01179, ecapa_loss=0.0002482, whisper_loss=0.09666, over 3880833.42 frames. ], batch size: 84, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:28:18,462 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 18 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 15:28:57,053 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-10 15:28:59,020 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.458e+01 2.982e+01 3.358e+01 3.918e+01 5.492e+01, threshold=6.716e+01, percent-clipped=0.0 2024-08-10 15:29:33,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=621220.0, ans=0.125 2024-08-10 15:29:34,324 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4150, loss[loss=0.07926, beats_loss=0.01228, ecapa_loss=0.0002343, whisper_loss=0.06464, over 17475.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01185, ecapa_loss=0.0002468, whisper_loss=0.09558, over 3870392.57 frames. ], batch size: 70, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:29:40,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=621220.0, ans=0.125 2024-08-10 15:29:46,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=621220.0, ans=0.0 2024-08-10 15:30:08,567 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 15:30:16,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=621420.0, ans=0.0 2024-08-10 15:30:22,942 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 15:30:28,698 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 15:30:37,755 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.662e-01 2024-08-10 15:30:49,027 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4200, loss[loss=0.09641, beats_loss=0.01392, ecapa_loss=0.0002246, whisper_loss=0.08025, over 13256.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01181, ecapa_loss=0.0002469, whisper_loss=0.09587, over 3887446.20 frames. ], batch size: 53, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:30:55,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=621720.0, ans=0.1 2024-08-10 15:31:11,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=621820.0, ans=0.125 2024-08-10 15:31:14,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=621820.0, ans=0.1 2024-08-10 15:31:16,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=621820.0, ans=0.0 2024-08-10 15:31:22,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=621920.0, ans=0.1 2024-08-10 15:31:31,221 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.809e+01 3.141e+01 3.651e+01 6.704e+01, threshold=6.282e+01, percent-clipped=0.0 2024-08-10 15:31:31,404 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-10 15:31:42,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=622020.0, ans=0.125 2024-08-10 15:31:56,128 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.24 vs. limit=22.5 2024-08-10 15:31:58,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=622120.0, ans=0.125 2024-08-10 15:32:01,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=622120.0, ans=0.125 2024-08-10 15:32:05,471 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4250, loss[loss=0.1119, beats_loss=0.01257, ecapa_loss=0.0002517, whisper_loss=0.09682, over 22399.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01182, ecapa_loss=0.0002474, whisper_loss=0.09547, over 3876647.53 frames. ], batch size: 88, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:32:09,355 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.57 vs. limit=22.5 2024-08-10 15:32:10,067 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 15:32:10,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=622220.0, ans=0.0 2024-08-10 15:32:11,012 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.82 vs. limit=15.0 2024-08-10 15:32:22,152 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 15:32:23,091 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.98 vs. limit=15.0 2024-08-10 15:32:32,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=622320.0, ans=0.1 2024-08-10 15:32:37,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=622420.0, ans=0.125 2024-08-10 15:32:45,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=622420.0, ans=0.1 2024-08-10 15:32:54,631 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.86 vs. limit=22.5 2024-08-10 15:33:11,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=622620.0, ans=0.125 2024-08-10 15:33:16,588 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 15:33:19,205 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4300, loss[loss=0.1101, beats_loss=0.01217, ecapa_loss=0.0003026, whisper_loss=0.09493, over 21524.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01178, ecapa_loss=0.0002454, whisper_loss=0.09509, over 3884892.72 frames. ], batch size: 93, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:33:25,335 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-10 15:33:27,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=622720.0, ans=0.1 2024-08-10 15:33:32,844 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-10 15:33:33,136 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.220e+05 2024-08-10 15:33:40,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=622820.0, ans=0.0 2024-08-10 15:33:41,596 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 15:33:59,924 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 2.798e+01 3.084e+01 3.774e+01 7.124e+01, threshold=6.168e+01, percent-clipped=2.0 2024-08-10 15:34:05,424 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 15:34:27,927 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-10 15:34:29,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=623220.0, ans=0.125 2024-08-10 15:34:30,486 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4350, loss[loss=0.1048, beats_loss=0.009837, ecapa_loss=0.0002682, whisper_loss=0.09224, over 19219.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01186, ecapa_loss=0.0002461, whisper_loss=0.09404, over 3866774.93 frames. ], batch size: 75, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:34:33,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=623220.0, ans=0.2 2024-08-10 15:34:47,325 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 15:34:53,245 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 15:34:55,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=623320.0, ans=10.0 2024-08-10 15:34:56,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=623320.0, ans=0.125 2024-08-10 15:35:07,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=623420.0, ans=0.125 2024-08-10 15:35:29,554 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=15.0 2024-08-10 15:35:44,932 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 15:35:50,727 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4400, loss[loss=0.118, beats_loss=0.01258, ecapa_loss=0.0002512, whisper_loss=0.1029, over 22311.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01186, ecapa_loss=0.0002445, whisper_loss=0.09429, over 3876141.21 frames. ], batch size: 92, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:35:52,517 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 15:36:01,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=623720.0, ans=0.1 2024-08-10 15:36:06,722 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 15:36:22,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=623920.0, ans=0.09899494936611666 2024-08-10 15:36:31,238 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 15:36:32,004 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.15 vs. limit=6.0 2024-08-10 15:36:33,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=623920.0, ans=0.0 2024-08-10 15:36:38,451 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.993e+01 3.424e+01 4.007e+01 6.509e+01, threshold=6.848e+01, percent-clipped=2.0 2024-08-10 15:36:40,516 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 15:36:50,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=624020.0, ans=0.0 2024-08-10 15:37:02,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=624120.0, ans=0.0 2024-08-10 15:37:06,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=624120.0, ans=0.1 2024-08-10 15:37:08,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=624120.0, ans=0.125 2024-08-10 15:37:10,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=624120.0, ans=0.015 2024-08-10 15:37:15,561 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4450, loss[loss=0.1311, beats_loss=0.009869, ecapa_loss=0.0002747, whisper_loss=0.1185, over 17024.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01187, ecapa_loss=0.0002445, whisper_loss=0.094, over 3860182.60 frames. ], batch size: 69, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:37:26,625 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 19 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-10 15:37:40,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=624320.0, ans=0.05 2024-08-10 15:37:51,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=624420.0, ans=0.1 2024-08-10 15:37:53,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=624420.0, ans=0.125 2024-08-10 15:37:59,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=624420.0, ans=0.1 2024-08-10 15:38:09,883 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=22.5 2024-08-10 15:38:32,708 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 15:38:39,165 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4500, loss[loss=0.1241, beats_loss=0.01179, ecapa_loss=0.0002249, whisper_loss=0.1101, over 22641.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01191, ecapa_loss=0.000244, whisper_loss=0.09428, over 3879824.01 frames. ], batch size: 85, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:39:15,935 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 15:39:27,364 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+01 2.908e+01 3.221e+01 3.849e+01 6.109e+01, threshold=6.442e+01, percent-clipped=0.0 2024-08-10 15:39:31,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=625020.0, ans=0.0 2024-08-10 15:39:52,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=625120.0, ans=0.125 2024-08-10 15:39:55,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=625120.0, ans=0.125 2024-08-10 15:39:58,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=625120.0, ans=0.125 2024-08-10 15:40:02,923 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 15:40:05,143 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4550, loss[loss=0.1196, beats_loss=0.00991, ecapa_loss=0.0002539, whisper_loss=0.1072, over 17665.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01186, ecapa_loss=0.0002458, whisper_loss=0.0948, over 3874442.30 frames. ], batch size: 67, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:40:06,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625220.0, ans=0.1 2024-08-10 15:40:26,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=625320.0, ans=0.125 2024-08-10 15:40:29,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=625320.0, ans=0.125 2024-08-10 15:41:10,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=625620.0, ans=0.125 2024-08-10 15:41:23,321 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4600, loss[loss=0.1166, beats_loss=0.01269, ecapa_loss=0.0002409, whisper_loss=0.1015, over 22566.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01194, ecapa_loss=0.0002446, whisper_loss=0.09488, over 3876760.76 frames. ], batch size: 92, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:41:26,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=625720.0, ans=0.07 2024-08-10 15:41:46,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625820.0, ans=0.1 2024-08-10 15:41:54,347 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 15:42:07,330 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.652e+01 3.147e+01 3.453e+01 6.048e+01, threshold=6.293e+01, percent-clipped=0.0 2024-08-10 15:42:09,082 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 15:42:25,381 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-10 15:42:34,491 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 15:42:42,293 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4650, loss[loss=0.1297, beats_loss=0.009857, ecapa_loss=0.0002623, whisper_loss=0.1172, over 23216.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01202, ecapa_loss=0.0002429, whisper_loss=0.0942, over 3897785.02 frames. ], batch size: 94, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:42:49,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=626220.0, ans=0.1 2024-08-10 15:42:54,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=626220.0, ans=0.125 2024-08-10 15:43:28,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.17 vs. limit=10.0 2024-08-10 15:43:30,618 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 15:43:35,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=626520.0, ans=0.0 2024-08-10 15:43:40,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=626520.0, ans=0.0 2024-08-10 15:44:03,825 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4700, loss[loss=0.1168, beats_loss=0.01133, ecapa_loss=0.0003004, whisper_loss=0.1025, over 20680.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01194, ecapa_loss=0.0002437, whisper_loss=0.09499, over 3906902.72 frames. ], batch size: 91, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:44:29,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=626820.0, ans=0.0 2024-08-10 15:44:29,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=626820.0, ans=0.04949747468305833 2024-08-10 15:44:32,400 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 15:44:33,987 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 34 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 15:44:41,574 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-10 15:44:48,479 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.667e+01 3.139e+01 3.783e+01 7.574e+01, threshold=6.278e+01, percent-clipped=1.0 2024-08-10 15:44:57,941 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-10 15:45:05,688 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 23 from LS+wenet, 9 from Vox, 21 fro AS 2024-08-10 15:45:09,806 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-08-10 15:45:24,987 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4750, loss[loss=0.0949, beats_loss=0.01441, ecapa_loss=0.0002117, whisper_loss=0.07837, over 21929.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.0119, ecapa_loss=0.0002426, whisper_loss=0.09495, over 3903549.07 frames. ], batch size: 88, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:45:41,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=627320.0, ans=0.125 2024-08-10 15:45:44,426 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-10 15:45:46,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=627320.0, ans=0.0 2024-08-10 15:46:47,482 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4800, loss[loss=0.1052, beats_loss=0.01299, ecapa_loss=0.0002106, whisper_loss=0.09006, over 13969.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01187, ecapa_loss=0.0002435, whisper_loss=0.09495, over 3865932.37 frames. ], batch size: 53, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:47:00,560 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-10 15:47:10,431 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-10 15:47:17,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=627820.0, ans=0.125 2024-08-10 15:47:19,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=627820.0, ans=0.0 2024-08-10 15:47:29,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=627920.0, ans=0.125 2024-08-10 15:47:35,605 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+01 2.946e+01 3.351e+01 4.117e+01 7.010e+01, threshold=6.703e+01, percent-clipped=2.0 2024-08-10 15:47:47,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=628020.0, ans=0.2 2024-08-10 15:47:55,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=628120.0, ans=0.1 2024-08-10 15:47:56,110 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=22.5 2024-08-10 15:48:12,488 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4850, loss[loss=0.1208, beats_loss=0.01227, ecapa_loss=0.0002405, whisper_loss=0.1061, over 22637.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01188, ecapa_loss=0.0002419, whisper_loss=0.09563, over 3914297.96 frames. ], batch size: 89, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:48:24,428 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 15:48:32,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=628320.0, ans=0.125 2024-08-10 15:48:44,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=628320.0, ans=0.125 2024-08-10 15:49:10,651 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 40 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-10 15:49:35,872 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4900, loss[loss=0.1006, beats_loss=0.01157, ecapa_loss=0.0001929, whisper_loss=0.08713, over 19754.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01191, ecapa_loss=0.0002417, whisper_loss=0.09513, over 3899137.25 frames. ], batch size: 79, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:49:45,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=628720.0, ans=0.125 2024-08-10 15:49:45,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=628720.0, ans=0.125 2024-08-10 15:49:47,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=628720.0, ans=0.125 2024-08-10 15:49:50,166 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 15:49:50,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=628820.0, ans=0.0 2024-08-10 15:50:03,470 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 15:50:11,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=628920.0, ans=0.125 2024-08-10 15:50:12,471 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 15:50:19,705 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.794e+01 3.081e+01 3.669e+01 6.406e+01, threshold=6.163e+01, percent-clipped=0.0 2024-08-10 15:50:31,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=629020.0, ans=0.2 2024-08-10 15:50:54,924 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 4950, loss[loss=0.1179, beats_loss=0.01275, ecapa_loss=0.0001465, whisper_loss=0.1037, over 18127.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01192, ecapa_loss=0.0002411, whisper_loss=0.09527, over 3881257.69 frames. ], batch size: 66, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:51:04,638 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 15:51:43,583 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 15:51:53,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=629520.0, ans=0.2 2024-08-10 15:52:15,803 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5000, loss[loss=0.1272, beats_loss=0.009453, ecapa_loss=0.0002545, whisper_loss=0.1152, over 19132.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01182, ecapa_loss=0.0002421, whisper_loss=0.09574, over 3882594.22 frames. ], batch size: 73, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:52:27,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=629720.0, ans=0.0 2024-08-10 15:52:52,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=629920.0, ans=0.125 2024-08-10 15:53:04,078 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.298e+01 2.939e+01 3.385e+01 3.961e+01 1.332e+02, threshold=6.770e+01, percent-clipped=1.0 2024-08-10 15:53:11,905 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-10 15:53:13,593 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 15:53:13,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=630020.0, ans=0.125 2024-08-10 15:53:30,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=630120.0, ans=0.125 2024-08-10 15:53:37,160 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5050, loss[loss=0.08461, beats_loss=0.01209, ecapa_loss=0.00024, whisper_loss=0.07012, over 13312.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01183, ecapa_loss=0.0002426, whisper_loss=0.09593, over 3890469.39 frames. ], batch size: 55, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:54:01,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=630320.0, ans=0.1 2024-08-10 15:54:18,843 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 15:54:37,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=630520.0, ans=0.125 2024-08-10 15:54:45,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=630620.0, ans=0.0 2024-08-10 15:54:48,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=630620.0, ans=0.1 2024-08-10 15:54:52,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=630620.0, ans=0.1 2024-08-10 15:54:58,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=630720.0, ans=0.0 2024-08-10 15:54:59,612 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5100, loss[loss=0.0968, beats_loss=0.01041, ecapa_loss=0.0002521, whisper_loss=0.08387, over 23220.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01174, ecapa_loss=0.000243, whisper_loss=0.09658, over 3869679.33 frames. ], batch size: 94, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 15:55:13,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=630720.0, ans=0.125 2024-08-10 15:55:15,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=630820.0, ans=0.125 2024-08-10 15:55:27,779 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-10 15:55:29,643 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 15:55:44,833 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.958e+01 3.434e+01 3.932e+01 6.642e+01, threshold=6.868e+01, percent-clipped=0.0 2024-08-10 15:55:52,933 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-10 15:56:14,267 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 15:56:20,712 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5150, loss[loss=0.09475, beats_loss=0.01416, ecapa_loss=0.000244, whisper_loss=0.07815, over 17518.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01176, ecapa_loss=0.0002415, whisper_loss=0.09692, over 3862165.55 frames. ], batch size: 74, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 15:56:21,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=631220.0, ans=0.0 2024-08-10 15:56:39,267 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.01 vs. limit=15.0 2024-08-10 15:56:40,582 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.12 vs. limit=15.0 2024-08-10 15:56:48,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=631420.0, ans=0.0 2024-08-10 15:56:51,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=631420.0, ans=0.2 2024-08-10 15:57:06,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=631520.0, ans=0.1 2024-08-10 15:57:15,465 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-10 15:57:26,601 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.59 vs. limit=22.5 2024-08-10 15:57:37,778 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5200, loss[loss=0.1116, beats_loss=0.01154, ecapa_loss=0.0001984, whisper_loss=0.09808, over 18779.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01183, ecapa_loss=0.0002403, whisper_loss=0.09657, over 3867639.43 frames. ], batch size: 73, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 15:57:48,018 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-10 15:57:53,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=631820.0, ans=0.1 2024-08-10 15:58:11,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=631920.0, ans=0.125 2024-08-10 15:58:19,293 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.945e+01 3.443e+01 4.072e+01 7.195e+01, threshold=6.886e+01, percent-clipped=1.0 2024-08-10 15:58:24,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=632020.0, ans=0.125 2024-08-10 15:58:46,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=632120.0, ans=0.125 2024-08-10 15:58:51,635 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5250, loss[loss=0.08244, beats_loss=0.01523, ecapa_loss=0.00025, whisper_loss=0.06471, over 17281.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01187, ecapa_loss=0.0002398, whisper_loss=0.09577, over 3853329.96 frames. ], batch size: 73, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 15:58:54,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=632220.0, ans=0.125 2024-08-10 15:59:00,504 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 15:59:10,709 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:59:15,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632320.0, ans=0.1 2024-08-10 15:59:28,676 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 15:59:36,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=632420.0, ans=0.2 2024-08-10 15:59:37,084 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 15:59:42,865 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 15:59:56,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=632620.0, ans=0.0 2024-08-10 16:00:03,528 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 16:00:07,040 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5300, loss[loss=0.1157, beats_loss=0.008285, ecapa_loss=0.0002746, whisper_loss=0.1046, over 13691.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01177, ecapa_loss=0.0002417, whisper_loss=0.09572, over 3853367.69 frames. ], batch size: 54, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:00:11,425 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-10 16:00:27,502 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=16.24 vs. limit=15.0 2024-08-10 16:00:29,404 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.87 vs. limit=15.0 2024-08-10 16:00:36,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=632920.0, ans=0.0 2024-08-10 16:00:47,843 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.840e+01 3.204e+01 3.763e+01 6.547e+01, threshold=6.407e+01, percent-clipped=0.0 2024-08-10 16:00:55,881 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-10 16:01:09,330 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 16:01:16,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=633120.0, ans=0.125 2024-08-10 16:01:18,670 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5350, loss[loss=0.1237, beats_loss=0.009733, ecapa_loss=0.0002058, whisper_loss=0.1119, over 23609.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01177, ecapa_loss=0.00024, whisper_loss=0.09607, over 3851842.75 frames. ], batch size: 87, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:01:26,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=633220.0, ans=0.0 2024-08-10 16:01:28,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=633220.0, ans=0.0 2024-08-10 16:01:28,386 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.21 vs. limit=6.0 2024-08-10 16:01:29,616 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2024-08-10 16:01:30,711 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 16:01:50,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=633420.0, ans=0.125 2024-08-10 16:01:51,435 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.48 vs. limit=22.5 2024-08-10 16:01:59,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633520.0, ans=0.1 2024-08-10 16:02:05,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=633520.0, ans=0.0 2024-08-10 16:02:15,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=633620.0, ans=0.2 2024-08-10 16:02:15,815 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2024-08-10 16:02:23,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633620.0, ans=0.1 2024-08-10 16:02:28,563 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5400, loss[loss=0.1472, beats_loss=0.008314, ecapa_loss=0.0002333, whisper_loss=0.1365, over 19772.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01174, ecapa_loss=0.0002387, whisper_loss=0.09637, over 3876348.07 frames. ], batch size: 73, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:02:30,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=633720.0, ans=0.125 2024-08-10 16:02:39,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=633720.0, ans=0.125 2024-08-10 16:02:46,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=633820.0, ans=0.125 2024-08-10 16:02:47,020 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.19 vs. limit=22.5 2024-08-10 16:02:48,106 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 16:03:07,445 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.970e+01 3.287e+01 3.858e+01 5.350e+01, threshold=6.575e+01, percent-clipped=0.0 2024-08-10 16:03:11,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=634020.0, ans=0.09899494936611666 2024-08-10 16:03:30,808 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 16:03:37,462 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5450, loss[loss=0.1084, beats_loss=0.0133, ecapa_loss=0.0002147, whisper_loss=0.09295, over 22981.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01172, ecapa_loss=0.0002375, whisper_loss=0.09702, over 3912828.08 frames. ], batch size: 93, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:03:47,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=634220.0, ans=0.0 2024-08-10 16:04:11,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=634420.0, ans=0.0 2024-08-10 16:04:11,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=634420.0, ans=0.1 2024-08-10 16:04:27,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=634520.0, ans=0.125 2024-08-10 16:04:28,888 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 16:04:39,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.47 vs. limit=15.0 2024-08-10 16:04:43,454 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-10 16:04:44,556 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5500, loss[loss=0.1272, beats_loss=0.008901, ecapa_loss=0.0002503, whisper_loss=0.1158, over 19127.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01167, ecapa_loss=0.0002379, whisper_loss=0.0972, over 3900145.10 frames. ], batch size: 74, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:04:49,721 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 16:04:54,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=634720.0, ans=0.125 2024-08-10 16:04:55,301 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2024-08-10 16:04:56,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=634720.0, ans=0.2 2024-08-10 16:05:05,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=634820.0, ans=0.0 2024-08-10 16:05:09,565 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2024-08-10 16:05:22,110 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.955e+01 3.201e+01 3.849e+01 6.033e+01, threshold=6.402e+01, percent-clipped=0.0 2024-08-10 16:05:25,345 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-10 16:05:35,914 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 16:05:52,875 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5550, loss[loss=0.1305, beats_loss=0.009665, ecapa_loss=0.0002827, whisper_loss=0.118, over 13824.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01175, ecapa_loss=0.0002385, whisper_loss=0.097, over 3908246.43 frames. ], batch size: 54, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:05:53,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635220.0, ans=0.1 2024-08-10 16:06:03,892 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 16:06:11,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=635320.0, ans=0.125 2024-08-10 16:06:14,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=635320.0, ans=0.125 2024-08-10 16:06:26,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=635420.0, ans=0.0 2024-08-10 16:06:42,866 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-10 16:06:58,678 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5600, loss[loss=0.1349, beats_loss=0.008638, ecapa_loss=0.0002461, whisper_loss=0.1238, over 20364.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01172, ecapa_loss=0.00024, whisper_loss=0.09656, over 3930636.50 frames. ], batch size: 80, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:07:04,153 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 16:07:08,004 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 13 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 16:07:14,684 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-10 16:07:17,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=635820.0, ans=0.125 2024-08-10 16:07:17,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=635820.0, ans=0.0 2024-08-10 16:07:28,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=635920.0, ans=0.125 2024-08-10 16:07:34,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=635920.0, ans=0.0 2024-08-10 16:07:35,374 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.708e+01 3.041e+01 3.496e+01 5.299e+01, threshold=6.081e+01, percent-clipped=0.0 2024-08-10 16:07:44,842 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 16:07:59,616 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 16:08:01,507 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2024-08-10 16:08:04,668 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5650, loss[loss=0.1074, beats_loss=0.0138, ecapa_loss=0.0002268, whisper_loss=0.09129, over 22692.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01178, ecapa_loss=0.000239, whisper_loss=0.09609, over 3955551.80 frames. ], batch size: 91, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:08:08,741 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 16:08:11,281 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 16:08:15,045 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-10 16:08:19,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=636320.0, ans=0.04949747468305833 2024-08-10 16:08:24,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=636320.0, ans=0.5 2024-08-10 16:08:27,082 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 16:08:30,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=636420.0, ans=0.0 2024-08-10 16:08:35,179 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-10 16:08:48,584 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 16:08:54,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=636520.0, ans=0.04949747468305833 2024-08-10 16:09:01,794 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 16:09:10,484 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5700, loss[loss=0.11, beats_loss=0.0109, ecapa_loss=0.0002991, whisper_loss=0.09613, over 18336.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01183, ecapa_loss=0.0002392, whisper_loss=0.09576, over 3934090.94 frames. ], batch size: 75, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:09:10,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=636720.0, ans=0.0 2024-08-10 16:09:38,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=636920.0, ans=0.0 2024-08-10 16:09:44,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=636920.0, ans=0.125 2024-08-10 16:09:48,304 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.389e+01 2.928e+01 3.301e+01 4.183e+01 7.157e+01, threshold=6.602e+01, percent-clipped=2.0 2024-08-10 16:09:52,866 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 16:10:01,140 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-10 16:10:19,243 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5750, loss[loss=0.08326, beats_loss=0.01303, ecapa_loss=0.0002733, whisper_loss=0.06749, over 16215.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01183, ecapa_loss=0.0002391, whisper_loss=0.09582, over 3924392.41 frames. ], batch size: 69, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:10:19,987 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=15.0 2024-08-10 16:10:32,228 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 16:10:44,189 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2024-08-10 16:10:52,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=637420.0, ans=0.05 2024-08-10 16:10:58,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=637420.0, ans=0.2 2024-08-10 16:11:12,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=637520.0, ans=0.0 2024-08-10 16:11:15,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=637620.0, ans=15.0 2024-08-10 16:11:28,428 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5800, loss[loss=0.1028, beats_loss=0.01494, ecapa_loss=0.0002107, whisper_loss=0.08579, over 18895.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01188, ecapa_loss=0.0002394, whisper_loss=0.09559, over 3918244.86 frames. ], batch size: 77, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:11:45,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=637820.0, ans=0.125 2024-08-10 16:11:47,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=637820.0, ans=0.125 2024-08-10 16:11:55,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=637920.0, ans=0.2 2024-08-10 16:11:58,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=637920.0, ans=0.2 2024-08-10 16:12:07,951 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.715e+01 3.192e+01 3.464e+01 4.938e+01, threshold=6.385e+01, percent-clipped=0.0 2024-08-10 16:12:15,642 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2024-08-10 16:12:16,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=638020.0, ans=0.0 2024-08-10 16:12:38,571 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5850, loss[loss=0.09778, beats_loss=0.01136, ecapa_loss=0.0002462, whisper_loss=0.08396, over 15500.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01197, ecapa_loss=0.0002398, whisper_loss=0.09515, over 3926325.43 frames. ], batch size: 62, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:12:41,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=638220.0, ans=0.125 2024-08-10 16:12:45,113 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 16:12:58,200 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 16:13:05,956 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 13 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 16:13:11,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=638420.0, ans=0.125 2024-08-10 16:13:17,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=638420.0, ans=0.125 2024-08-10 16:13:24,914 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-10 16:13:26,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=638520.0, ans=15.0 2024-08-10 16:13:38,049 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.83 vs. limit=22.5 2024-08-10 16:13:40,124 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 14 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 16:13:48,178 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5900, loss[loss=0.09969, beats_loss=0.01321, ecapa_loss=0.0002166, whisper_loss=0.08432, over 21259.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.0119, ecapa_loss=0.0002403, whisper_loss=0.09495, over 3891369.32 frames. ], batch size: 83, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:14:01,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=638820.0, ans=0.125 2024-08-10 16:14:05,014 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 16:14:05,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=638820.0, ans=0.125 2024-08-10 16:14:10,139 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-10 16:14:14,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=638920.0, ans=0.125 2024-08-10 16:14:25,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=638920.0, ans=0.0 2024-08-10 16:14:26,300 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.150e+01 2.959e+01 3.304e+01 3.845e+01 6.831e+01, threshold=6.608e+01, percent-clipped=1.0 2024-08-10 16:14:29,400 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 16:14:36,587 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:14:36,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=639020.0, ans=0.2 2024-08-10 16:14:39,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=639020.0, ans=0.0 2024-08-10 16:14:47,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=639120.0, ans=0.2 2024-08-10 16:14:53,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639120.0, ans=0.1 2024-08-10 16:14:56,601 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 5950, loss[loss=0.1132, beats_loss=0.01167, ecapa_loss=0.000255, whisper_loss=0.09903, over 19675.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01193, ecapa_loss=0.0002414, whisper_loss=0.09494, over 3885638.86 frames. ], batch size: 79, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:15:26,330 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.080e-01 2024-08-10 16:15:30,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639420.0, ans=0.1 2024-08-10 16:15:30,794 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2024-08-10 16:15:32,174 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.498e+03 2024-08-10 16:15:47,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=639520.0, ans=0.0 2024-08-10 16:15:51,523 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 16:16:01,509 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-10 16:16:04,641 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 10 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 16:16:07,867 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6000, loss[loss=0.1185, beats_loss=0.01323, ecapa_loss=0.000191, whisper_loss=0.1034, over 16836.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01192, ecapa_loss=0.0002414, whisper_loss=0.09528, over 3874003.96 frames. ], batch size: 62, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:16:07,868 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-10 16:16:49,324 INFO [train_multi_KD3.py:1149] (2/4) Epoch 5, validation on ASR_libri: loss=0.2642, beats_loss=0, ecapa_loss=0.0007414, whisper_loss=0.2567, over 922467.00 frames. 2024-08-10 16:17:08,516 INFO [train_multi_KD3.py:1149] (2/4) Epoch 5, validation on SV_voxceleb1: loss=0.006164, beats_loss=0, ecapa_loss=0.0006164, whisper_loss=0, over 939242.00 frames. 2024-08-10 16:19:02,574 INFO [train_multi_KD3.py:1149] (2/4) Epoch 5, validation on AT_audioset: loss=0.02682, beats_loss=0.02682, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 16:19:02,578 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 16:19:13,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=639720.0, ans=0.125 2024-08-10 16:19:18,518 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 16:19:25,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=639820.0, ans=0.2 2024-08-10 16:19:29,057 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.39 vs. limit=6.0 2024-08-10 16:19:36,491 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.31 vs. limit=15.0 2024-08-10 16:19:45,228 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 2.869e+01 3.209e+01 3.631e+01 6.157e+01, threshold=6.418e+01, percent-clipped=0.0 2024-08-10 16:19:48,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=640020.0, ans=0.125 2024-08-10 16:19:57,320 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.17 vs. limit=15.0 2024-08-10 16:20:08,094 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.78 vs. limit=15.0 2024-08-10 16:20:15,494 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6050, loss[loss=0.111, beats_loss=0.01152, ecapa_loss=0.0002661, whisper_loss=0.09679, over 19934.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01189, ecapa_loss=0.0002412, whisper_loss=0.09558, over 3881301.09 frames. ], batch size: 80, lr: 1.25e-02, grad_scale: 137438953472.0 2024-08-10 16:20:22,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=640220.0, ans=0.125 2024-08-10 16:20:47,221 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 16:20:57,632 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 16:21:09,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=640520.0, ans=0.05 2024-08-10 16:21:14,933 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 21 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-10 16:21:32,213 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6100, loss[loss=0.1042, beats_loss=0.01134, ecapa_loss=0.000251, whisper_loss=0.0903, over 20944.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01183, ecapa_loss=0.0002422, whisper_loss=0.09541, over 3897155.58 frames. ], batch size: 84, lr: 1.25e-02, grad_scale: 137438953472.0 2024-08-10 16:21:41,801 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 16:21:47,786 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2024-08-10 16:21:49,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=640820.0, ans=0.125 2024-08-10 16:21:55,121 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-10 16:22:15,574 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.283e+01 3.045e+01 3.489e+01 4.204e+01 8.442e+01, threshold=6.977e+01, percent-clipped=4.0 2024-08-10 16:22:22,672 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 16:22:33,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.93 vs. limit=22.5 2024-08-10 16:22:37,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=641120.0, ans=0.2 2024-08-10 16:22:48,121 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6150, loss[loss=0.08004, beats_loss=0.01119, ecapa_loss=0.0002975, whisper_loss=0.06587, over 13005.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01181, ecapa_loss=0.0002418, whisper_loss=0.09548, over 3886312.31 frames. ], batch size: 55, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:22:48,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=641220.0, ans=0.125 2024-08-10 16:23:05,708 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2024-08-10 16:23:06,083 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.21 vs. limit=15.0 2024-08-10 16:23:12,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=641320.0, ans=0.125 2024-08-10 16:23:15,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=641320.0, ans=0.125 2024-08-10 16:23:18,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=641420.0, ans=0.0 2024-08-10 16:23:21,418 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 16:23:35,887 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 16:23:58,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=641620.0, ans=0.2 2024-08-10 16:23:59,974 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.01 vs. limit=15.0 2024-08-10 16:24:03,857 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6200, loss[loss=0.08705, beats_loss=0.01394, ecapa_loss=0.0002674, whisper_loss=0.07044, over 14972.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01195, ecapa_loss=0.0002422, whisper_loss=0.09495, over 3912331.37 frames. ], batch size: 65, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:24:11,170 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 16:24:11,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=641720.0, ans=0.125 2024-08-10 16:24:25,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=641820.0, ans=0.0 2024-08-10 16:24:32,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=641920.0, ans=0.125 2024-08-10 16:24:40,418 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.82 vs. limit=15.0 2024-08-10 16:24:42,626 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.187e+01 2.777e+01 3.185e+01 3.780e+01 9.777e+01, threshold=6.369e+01, percent-clipped=1.0 2024-08-10 16:24:45,701 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 36 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 16:24:59,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=642120.0, ans=0.1 2024-08-10 16:25:04,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=642120.0, ans=0.0 2024-08-10 16:25:16,330 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6250, loss[loss=0.1218, beats_loss=0.01169, ecapa_loss=0.0002661, whisper_loss=0.1075, over 18633.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01198, ecapa_loss=0.0002397, whisper_loss=0.09476, over 3895515.77 frames. ], batch size: 77, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:25:18,914 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 16:25:36,701 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 16:26:04,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=642520.0, ans=0.2 2024-08-10 16:26:12,223 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 16:26:12,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=642520.0, ans=0.125 2024-08-10 16:26:15,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=642620.0, ans=0.125 2024-08-10 16:26:18,800 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 16:26:24,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=642620.0, ans=0.125 2024-08-10 16:26:31,270 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6300, loss[loss=0.108, beats_loss=0.0132, ecapa_loss=0.0002625, whisper_loss=0.09217, over 15581.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01196, ecapa_loss=0.0002416, whisper_loss=0.09369, over 3842319.08 frames. ], batch size: 64, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:26:35,647 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 16:26:46,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=642820.0, ans=0.125 2024-08-10 16:26:48,796 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.39 vs. limit=10.0 2024-08-10 16:26:49,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=642820.0, ans=0.125 2024-08-10 16:27:11,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=642920.0, ans=0.125 2024-08-10 16:27:12,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=642920.0, ans=0.1 2024-08-10 16:27:14,972 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.923e+01 3.266e+01 3.609e+01 6.240e+01, threshold=6.531e+01, percent-clipped=0.0 2024-08-10 16:27:17,227 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-10 16:27:17,829 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 24 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-10 16:27:37,676 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 16:27:46,132 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6350, loss[loss=0.1267, beats_loss=0.009811, ecapa_loss=0.0002352, whisper_loss=0.1145, over 16064.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01196, ecapa_loss=0.0002428, whisper_loss=0.09417, over 3855650.39 frames. ], batch size: 61, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:27:50,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=643220.0, ans=0.125 2024-08-10 16:27:57,458 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 16:28:00,197 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 16:28:06,162 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 16:28:09,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=643320.0, ans=0.125 2024-08-10 16:28:11,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=643320.0, ans=10.0 2024-08-10 16:28:28,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=643520.0, ans=0.1 2024-08-10 16:28:35,987 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 12 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 16:28:36,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=643520.0, ans=0.1 2024-08-10 16:28:43,834 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.066e-03 2024-08-10 16:28:49,198 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 16:28:57,869 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6400, loss[loss=0.09202, beats_loss=0.01022, ecapa_loss=0.0003272, whisper_loss=0.07853, over 16809.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01194, ecapa_loss=0.0002416, whisper_loss=0.09462, over 3859161.02 frames. ], batch size: 70, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:29:06,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=643720.0, ans=0.125 2024-08-10 16:29:07,300 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-10 16:29:17,564 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 16:29:17,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=643820.0, ans=0.0 2024-08-10 16:29:22,453 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 16:29:36,869 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.224e+01 2.802e+01 3.219e+01 3.654e+01 6.592e+01, threshold=6.438e+01, percent-clipped=1.0 2024-08-10 16:29:46,781 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 16:29:49,284 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 16:29:52,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=644120.0, ans=0.04949747468305833 2024-08-10 16:30:00,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=644120.0, ans=0.0 2024-08-10 16:30:07,035 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6450, loss[loss=0.136, beats_loss=0.009065, ecapa_loss=0.0002472, whisper_loss=0.1245, over 23413.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01195, ecapa_loss=0.0002403, whisper_loss=0.09451, over 3889949.76 frames. ], batch size: 92, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:30:11,365 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-10 16:30:17,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=644220.0, ans=0.1 2024-08-10 16:30:19,686 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.88 vs. limit=15.0 2024-08-10 16:30:33,675 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 16:30:34,853 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 16:30:36,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=644420.0, ans=0.0 2024-08-10 16:30:41,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=644420.0, ans=0.125 2024-08-10 16:30:57,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=644520.0, ans=0.125 2024-08-10 16:30:58,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=644520.0, ans=0.125 2024-08-10 16:31:14,627 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6500, loss[loss=0.1111, beats_loss=0.01324, ecapa_loss=0.0002393, whisper_loss=0.09544, over 18719.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01194, ecapa_loss=0.0002399, whisper_loss=0.09486, over 3891516.97 frames. ], batch size: 77, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:31:33,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.14 vs. limit=15.0 2024-08-10 16:31:38,155 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-10 16:31:53,162 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 3.031e+01 3.251e+01 3.712e+01 6.418e+01, threshold=6.501e+01, percent-clipped=0.0 2024-08-10 16:32:05,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=645020.0, ans=0.125 2024-08-10 16:32:06,345 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.11 vs. limit=15.0 2024-08-10 16:32:09,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=645120.0, ans=15.0 2024-08-10 16:32:12,477 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.34 vs. limit=22.5 2024-08-10 16:32:14,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=645120.0, ans=0.125 2024-08-10 16:32:23,444 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6550, loss[loss=0.0936, beats_loss=0.01543, ecapa_loss=0.0002249, whisper_loss=0.07592, over 16993.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01189, ecapa_loss=0.0002412, whisper_loss=0.09542, over 3912584.95 frames. ], batch size: 69, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:32:25,499 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:32:26,395 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 16:32:49,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=645320.0, ans=0.125 2024-08-10 16:33:05,148 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-10 16:33:13,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=645520.0, ans=0.125 2024-08-10 16:33:17,622 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-10 16:33:23,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=645620.0, ans=0.125 2024-08-10 16:33:25,173 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2024-08-10 16:33:27,365 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 16:33:28,881 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 16:33:32,493 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6600, loss[loss=0.09637, beats_loss=0.01263, ecapa_loss=0.0002487, whisper_loss=0.08125, over 13343.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01181, ecapa_loss=0.0002427, whisper_loss=0.09578, over 3922184.15 frames. ], batch size: 53, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:33:46,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=645820.0, ans=0.125 2024-08-10 16:33:47,743 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 16:33:49,805 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-10 16:33:53,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=645820.0, ans=0.0 2024-08-10 16:33:53,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=645820.0, ans=0.125 2024-08-10 16:33:55,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=645820.0, ans=0.95 2024-08-10 16:34:04,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=645920.0, ans=0.125 2024-08-10 16:34:06,359 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.31 vs. limit=15.0 2024-08-10 16:34:07,717 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.08 vs. limit=15.0 2024-08-10 16:34:11,268 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+01 2.864e+01 3.355e+01 3.985e+01 6.693e+01, threshold=6.710e+01, percent-clipped=1.0 2024-08-10 16:34:12,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=646020.0, ans=0.2 2024-08-10 16:34:15,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=646020.0, ans=0.125 2024-08-10 16:34:17,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=646020.0, ans=0.125 2024-08-10 16:34:29,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=646120.0, ans=0.0 2024-08-10 16:34:34,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=646120.0, ans=0.125 2024-08-10 16:34:43,463 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6650, loss[loss=0.1087, beats_loss=0.0114, ecapa_loss=0.000294, whisper_loss=0.09435, over 14957.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01186, ecapa_loss=0.0002422, whisper_loss=0.09585, over 3932641.38 frames. ], batch size: 64, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:34:52,001 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 16:34:54,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=646220.0, ans=0.05 2024-08-10 16:34:56,656 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2024-08-10 16:35:04,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=646320.0, ans=0.0 2024-08-10 16:35:07,738 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-10 16:35:12,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=646420.0, ans=0.0 2024-08-10 16:35:31,873 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.18 vs. limit=6.0 2024-08-10 16:35:41,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=646620.0, ans=0.125 2024-08-10 16:35:54,145 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 19 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 16:35:55,446 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6700, loss[loss=0.09852, beats_loss=0.01369, ecapa_loss=0.0001936, whisper_loss=0.0829, over 19572.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01192, ecapa_loss=0.0002391, whisper_loss=0.09579, over 3942339.52 frames. ], batch size: 76, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:36:04,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=646720.0, ans=0.0 2024-08-10 16:36:12,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=646820.0, ans=0.0 2024-08-10 16:36:12,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=646820.0, ans=0.2 2024-08-10 16:36:26,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=646920.0, ans=0.125 2024-08-10 16:36:27,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=646920.0, ans=0.0 2024-08-10 16:36:34,776 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.879e+01 3.180e+01 3.709e+01 5.171e+01, threshold=6.361e+01, percent-clipped=0.0 2024-08-10 16:37:05,433 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6750, loss[loss=0.1214, beats_loss=0.01134, ecapa_loss=0.0002572, whisper_loss=0.1074, over 23106.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01187, ecapa_loss=0.0002415, whisper_loss=0.09563, over 3904611.67 frames. ], batch size: 94, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:37:13,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=647220.0, ans=0.5 2024-08-10 16:37:13,668 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=12.0 2024-08-10 16:37:21,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=647320.0, ans=0.1 2024-08-10 16:37:27,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=647320.0, ans=0.1 2024-08-10 16:37:41,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=647420.0, ans=0.0 2024-08-10 16:37:43,119 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.24 vs. limit=15.0 2024-08-10 16:37:55,046 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.13 vs. limit=22.5 2024-08-10 16:38:12,908 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6800, loss[loss=0.127, beats_loss=0.0113, ecapa_loss=0.0002036, whisper_loss=0.1136, over 23613.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01183, ecapa_loss=0.0002431, whisper_loss=0.09471, over 3893779.17 frames. ], batch size: 90, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:38:14,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=647720.0, ans=0.125 2024-08-10 16:38:30,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=647820.0, ans=0.125 2024-08-10 16:38:32,460 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:38:33,493 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 16:38:33,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=647820.0, ans=0.0 2024-08-10 16:38:53,178 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+01 2.892e+01 3.322e+01 4.059e+01 7.063e+01, threshold=6.643e+01, percent-clipped=1.0 2024-08-10 16:39:00,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=648020.0, ans=0.125 2024-08-10 16:39:00,994 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-08-10 16:39:02,695 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 16:39:18,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=648120.0, ans=0.04949747468305833 2024-08-10 16:39:22,821 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6850, loss[loss=0.1293, beats_loss=0.01057, ecapa_loss=0.0002728, whisper_loss=0.116, over 21699.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01188, ecapa_loss=0.0002433, whisper_loss=0.09456, over 3871173.27 frames. ], batch size: 89, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:39:33,756 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.84 vs. limit=6.0 2024-08-10 16:39:51,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=648420.0, ans=0.0 2024-08-10 16:39:55,615 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2024-08-10 16:39:56,245 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 16:40:07,933 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 16:40:31,800 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6900, loss[loss=0.1114, beats_loss=0.01135, ecapa_loss=0.0001823, whisper_loss=0.09821, over 17613.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01177, ecapa_loss=0.0002438, whisper_loss=0.09522, over 3865645.59 frames. ], batch size: 66, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:40:36,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=648720.0, ans=0.125 2024-08-10 16:40:44,673 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-10 16:40:52,083 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.60 vs. limit=15.0 2024-08-10 16:40:54,491 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-10 16:41:10,313 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.858e+01 3.304e+01 3.695e+01 5.634e+01, threshold=6.608e+01, percent-clipped=0.0 2024-08-10 16:41:10,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=648920.0, ans=0.0 2024-08-10 16:41:11,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=649020.0, ans=0.1 2024-08-10 16:41:40,398 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 6950, loss[loss=0.0855, beats_loss=0.01253, ecapa_loss=0.0001707, whisper_loss=0.07127, over 17593.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01187, ecapa_loss=0.0002426, whisper_loss=0.09471, over 3865114.43 frames. ], batch size: 67, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:41:40,605 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 16:41:47,643 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.48 vs. limit=12.0 2024-08-10 16:41:48,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=649220.0, ans=0.0 2024-08-10 16:41:55,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=649320.0, ans=0.125 2024-08-10 16:42:19,463 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.900e+00 2024-08-10 16:42:37,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=649620.0, ans=0.5 2024-08-10 16:42:49,122 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7000, loss[loss=0.1096, beats_loss=0.008792, ecapa_loss=0.0002721, whisper_loss=0.09814, over 21585.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01179, ecapa_loss=0.0002427, whisper_loss=0.09484, over 3849386.36 frames. ], batch size: 87, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:42:56,158 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.87 vs. limit=15.0 2024-08-10 16:43:01,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=649820.0, ans=0.2 2024-08-10 16:43:06,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=649820.0, ans=0.125 2024-08-10 16:43:06,791 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2024-08-10 16:43:25,659 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.800e+01 3.263e+01 3.998e+01 9.402e+01, threshold=6.527e+01, percent-clipped=1.0 2024-08-10 16:43:33,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=650020.0, ans=0.125 2024-08-10 16:43:53,967 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-10 16:43:55,140 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7050, loss[loss=0.1119, beats_loss=0.0103, ecapa_loss=0.0002882, whisper_loss=0.09875, over 21835.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01184, ecapa_loss=0.0002434, whisper_loss=0.09429, over 3834794.66 frames. ], batch size: 90, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:43:57,323 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.55 vs. limit=10.0 2024-08-10 16:44:00,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=650220.0, ans=0.09899494936611666 2024-08-10 16:44:06,470 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 16:44:17,058 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 16:44:24,043 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 16:44:45,415 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2024-08-10 16:44:46,040 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 16:44:47,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650620.0, ans=0.1 2024-08-10 16:45:01,639 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7100, loss[loss=0.1089, beats_loss=0.01222, ecapa_loss=0.0002371, whisper_loss=0.09432, over 17980.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01184, ecapa_loss=0.0002412, whisper_loss=0.0946, over 3846254.80 frames. ], batch size: 71, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:45:17,523 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2024-08-10 16:45:19,463 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-10 16:45:28,265 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.98 vs. limit=6.0 2024-08-10 16:45:29,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=650920.0, ans=0.0 2024-08-10 16:45:39,560 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.645e+01 3.161e+01 3.535e+01 5.692e+01, threshold=6.321e+01, percent-clipped=0.0 2024-08-10 16:46:00,924 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 16:46:08,617 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7150, loss[loss=0.122, beats_loss=0.01038, ecapa_loss=0.0002754, whisper_loss=0.1089, over 22452.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.0119, ecapa_loss=0.0002398, whisper_loss=0.09452, over 3880139.39 frames. ], batch size: 90, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:46:15,458 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 16:46:21,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=651320.0, ans=0.2 2024-08-10 16:46:26,781 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-10 16:46:31,966 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 16:46:36,276 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 16:46:45,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=651420.0, ans=0.125 2024-08-10 16:46:47,528 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 28 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-10 16:47:17,092 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7200, loss[loss=0.0931, beats_loss=0.007712, ecapa_loss=0.0003325, whisper_loss=0.08206, over 17009.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01184, ecapa_loss=0.0002397, whisper_loss=0.0945, over 3895854.65 frames. ], batch size: 71, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:47:18,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=651720.0, ans=0.125 2024-08-10 16:47:25,693 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2024-08-10 16:47:37,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=651820.0, ans=0.025 2024-08-10 16:47:56,368 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.798e+01 3.257e+01 4.005e+01 1.167e+02, threshold=6.513e+01, percent-clipped=2.0 2024-08-10 16:48:01,736 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 16:48:02,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=652020.0, ans=0.0 2024-08-10 16:48:13,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=652120.0, ans=0.1 2024-08-10 16:48:26,571 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7250, loss[loss=0.1178, beats_loss=0.008279, ecapa_loss=0.000255, whisper_loss=0.107, over 15798.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01186, ecapa_loss=0.0002394, whisper_loss=0.09436, over 3892911.63 frames. ], batch size: 59, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:48:49,446 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.71 vs. limit=10.0 2024-08-10 16:48:58,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.44 vs. limit=15.0 2024-08-10 16:49:07,551 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 16:49:19,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=652520.0, ans=0.02 2024-08-10 16:49:19,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=652520.0, ans=0.125 2024-08-10 16:49:19,334 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.78 vs. limit=6.0 2024-08-10 16:49:20,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=652520.0, ans=0.125 2024-08-10 16:49:24,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=652620.0, ans=0.125 2024-08-10 16:49:26,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=652620.0, ans=0.125 2024-08-10 16:49:38,479 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7300, loss[loss=0.07865, beats_loss=0.01236, ecapa_loss=0.0002215, whisper_loss=0.06407, over 19672.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01189, ecapa_loss=0.0002392, whisper_loss=0.09426, over 3896892.55 frames. ], batch size: 78, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:50:16,665 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.741e+01 3.048e+01 3.552e+01 4.958e+01, threshold=6.095e+01, percent-clipped=0.0 2024-08-10 16:50:17,424 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-10 16:50:31,310 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-08-10 16:50:46,728 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7350, loss[loss=0.1103, beats_loss=0.01429, ecapa_loss=0.0002395, whisper_loss=0.09357, over 17441.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01193, ecapa_loss=0.0002392, whisper_loss=0.09415, over 3848172.80 frames. ], batch size: 72, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:50:56,185 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 16:51:00,447 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 16:51:24,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=653420.0, ans=0.0 2024-08-10 16:51:55,927 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 16:51:57,041 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7400, loss[loss=0.1068, beats_loss=0.01047, ecapa_loss=0.0002061, whisper_loss=0.09431, over 14578.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01194, ecapa_loss=0.0002376, whisper_loss=0.09462, over 3844024.63 frames. ], batch size: 55, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:51:59,926 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 16:52:25,724 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.50 vs. limit=10.0 2024-08-10 16:52:30,353 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-10 16:52:30,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=653920.0, ans=0.0 2024-08-10 16:52:32,768 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 26 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-10 16:52:35,353 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.252e+01 2.867e+01 3.212e+01 3.714e+01 5.750e+01, threshold=6.424e+01, percent-clipped=0.0 2024-08-10 16:52:48,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=654020.0, ans=0.1 2024-08-10 16:52:50,310 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.25 vs. limit=22.5 2024-08-10 16:53:01,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=654120.0, ans=0.125 2024-08-10 16:53:04,334 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.64 vs. limit=22.5 2024-08-10 16:53:04,952 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7450, loss[loss=0.09382, beats_loss=0.01106, ecapa_loss=0.0002476, whisper_loss=0.08028, over 18084.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01188, ecapa_loss=0.0002364, whisper_loss=0.0958, over 3862004.03 frames. ], batch size: 73, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:53:08,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=654220.0, ans=0.125 2024-08-10 16:53:08,993 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 28 from Vox, 18 fro AS 2024-08-10 16:53:27,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=654320.0, ans=0.125 2024-08-10 16:53:38,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=654420.0, ans=0.0 2024-08-10 16:53:47,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=654520.0, ans=0.2 2024-08-10 16:53:47,993 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-08-10 16:54:00,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=654620.0, ans=0.07 2024-08-10 16:54:03,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=654620.0, ans=0.05 2024-08-10 16:54:04,634 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-10 16:54:08,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=654620.0, ans=0.02 2024-08-10 16:54:10,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=654720.0, ans=0.0 2024-08-10 16:54:11,078 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7500, loss[loss=0.09817, beats_loss=0.01581, ecapa_loss=0.0001736, whisper_loss=0.08063, over 19638.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01186, ecapa_loss=0.0002376, whisper_loss=0.09472, over 3863041.74 frames. ], batch size: 75, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:54:17,775 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 28 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 16:54:19,107 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 18 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-10 16:54:19,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=654720.0, ans=0.125 2024-08-10 16:54:28,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=654820.0, ans=0.2 2024-08-10 16:54:48,180 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.920e+01 3.227e+01 3.863e+01 6.212e+01, threshold=6.454e+01, percent-clipped=0.0 2024-08-10 16:55:02,810 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 29 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 16:55:17,245 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7550, loss[loss=0.1087, beats_loss=0.01113, ecapa_loss=0.0003129, whisper_loss=0.09445, over 22716.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01187, ecapa_loss=0.0002366, whisper_loss=0.09524, over 3875483.60 frames. ], batch size: 97, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:55:26,952 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 16:55:43,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=655420.0, ans=0.125 2024-08-10 16:56:06,861 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 15 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-10 16:56:15,368 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-10 16:56:24,234 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7600, loss[loss=0.09549, beats_loss=0.01505, ecapa_loss=0.00024, whisper_loss=0.07804, over 13245.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01182, ecapa_loss=0.0002373, whisper_loss=0.09478, over 3849848.30 frames. ], batch size: 54, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:56:28,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=655720.0, ans=0.0 2024-08-10 16:56:43,221 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 16:56:48,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=655820.0, ans=0.1 2024-08-10 16:56:56,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=655920.0, ans=0.125 2024-08-10 16:57:02,723 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.834e+01 3.413e+01 3.883e+01 8.700e+01, threshold=6.826e+01, percent-clipped=1.0 2024-08-10 16:57:06,174 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.62 vs. limit=15.0 2024-08-10 16:57:15,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=656020.0, ans=0.0 2024-08-10 16:57:21,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=656120.0, ans=0.125 2024-08-10 16:57:31,970 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7650, loss[loss=0.09998, beats_loss=0.0105, ecapa_loss=0.0002363, whisper_loss=0.08712, over 15651.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01176, ecapa_loss=0.0002388, whisper_loss=0.09535, over 3873490.56 frames. ], batch size: 61, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:57:34,935 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 31 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 16:57:42,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=656220.0, ans=0.0 2024-08-10 16:58:01,622 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.84 vs. limit=15.0 2024-08-10 16:58:05,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=656420.0, ans=0.2 2024-08-10 16:58:08,119 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2024-08-10 16:58:14,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=656520.0, ans=0.125 2024-08-10 16:58:29,806 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 16:58:37,476 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7700, loss[loss=0.09003, beats_loss=0.01367, ecapa_loss=0.0002235, whisper_loss=0.07413, over 21121.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01184, ecapa_loss=0.0002391, whisper_loss=0.09461, over 3861598.35 frames. ], batch size: 89, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:58:53,166 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 15 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 16:59:09,244 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.90 vs. limit=22.5 2024-08-10 16:59:10,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=656920.0, ans=0.125 2024-08-10 16:59:14,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=656920.0, ans=0.1 2024-08-10 16:59:15,234 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+01 2.911e+01 3.362e+01 3.849e+01 6.405e+01, threshold=6.723e+01, percent-clipped=0.0 2024-08-10 16:59:22,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=657020.0, ans=0.0 2024-08-10 16:59:22,991 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-10 16:59:24,154 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 27 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 16:59:27,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=657020.0, ans=0.125 2024-08-10 16:59:31,506 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.08 vs. limit=10.0 2024-08-10 16:59:33,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=657120.0, ans=0.0 2024-08-10 16:59:35,335 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 16:59:38,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=657120.0, ans=0.0 2024-08-10 16:59:44,371 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7750, loss[loss=0.1085, beats_loss=0.01045, ecapa_loss=0.0003243, whisper_loss=0.09477, over 19558.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01183, ecapa_loss=0.0002391, whisper_loss=0.0944, over 3870229.29 frames. ], batch size: 84, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:00:03,241 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.14 vs. limit=15.0 2024-08-10 17:00:19,192 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 17:00:34,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=657520.0, ans=0.07 2024-08-10 17:00:40,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=657620.0, ans=0.125 2024-08-10 17:00:40,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=657620.0, ans=0.0 2024-08-10 17:00:42,299 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 27 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 17:00:51,419 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7800, loss[loss=0.09206, beats_loss=0.0116, ecapa_loss=0.0002429, whisper_loss=0.07803, over 18252.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01178, ecapa_loss=0.0002407, whisper_loss=0.09484, over 3870756.73 frames. ], batch size: 74, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:00:55,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=657720.0, ans=10.0 2024-08-10 17:01:02,448 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2024-08-10 17:01:11,950 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-08-10 17:01:28,041 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.784e+01 3.058e+01 3.552e+01 6.431e+01, threshold=6.115e+01, percent-clipped=0.0 2024-08-10 17:01:35,237 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 17:01:57,330 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7850, loss[loss=0.1113, beats_loss=0.01106, ecapa_loss=0.000229, whisper_loss=0.0979, over 20494.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.0118, ecapa_loss=0.0002394, whisper_loss=0.09525, over 3857867.76 frames. ], batch size: 80, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:02:45,588 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.18 vs. limit=15.0 2024-08-10 17:02:49,746 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.37 vs. limit=15.0 2024-08-10 17:02:53,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=658620.0, ans=0.0 2024-08-10 17:02:55,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=658620.0, ans=0.0 2024-08-10 17:02:56,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=658620.0, ans=0.125 2024-08-10 17:02:58,333 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 30 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 17:03:01,262 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 17:03:01,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=658620.0, ans=0.125 2024-08-10 17:03:04,997 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7900, loss[loss=0.1219, beats_loss=0.01462, ecapa_loss=0.0002257, whisper_loss=0.105, over 21014.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.0119, ecapa_loss=0.000238, whisper_loss=0.09559, over 3860036.18 frames. ], batch size: 85, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:03:05,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=658720.0, ans=0.0 2024-08-10 17:03:17,908 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 26 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-10 17:03:27,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=658820.0, ans=0.125 2024-08-10 17:03:34,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=658920.0, ans=0.0 2024-08-10 17:03:42,961 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+01 2.841e+01 3.204e+01 3.801e+01 5.785e+01, threshold=6.407e+01, percent-clipped=0.0 2024-08-10 17:03:47,505 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.36 vs. limit=15.0 2024-08-10 17:04:12,390 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 7950, loss[loss=0.09901, beats_loss=0.01137, ecapa_loss=0.00023, whisper_loss=0.08534, over 17473.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01183, ecapa_loss=0.0002387, whisper_loss=0.09535, over 3865553.33 frames. ], batch size: 70, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:04:14,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=659220.0, ans=0.125 2024-08-10 17:04:31,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=659320.0, ans=0.125 2024-08-10 17:04:32,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=659320.0, ans=0.0 2024-08-10 17:04:34,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=659320.0, ans=0.125 2024-08-10 17:04:45,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=659420.0, ans=0.1 2024-08-10 17:04:46,091 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 27 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-10 17:04:49,875 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-10 17:04:59,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=659520.0, ans=0.1 2024-08-10 17:05:02,099 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 11 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 17:05:03,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=659520.0, ans=0.1 2024-08-10 17:05:08,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=659620.0, ans=0.2 2024-08-10 17:05:08,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=659620.0, ans=0.1 2024-08-10 17:05:12,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=659620.0, ans=0.125 2024-08-10 17:05:19,008 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8000, loss[loss=0.0917, beats_loss=0.01201, ecapa_loss=0.0002516, whisper_loss=0.07718, over 17026.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01183, ecapa_loss=0.0002384, whisper_loss=0.09539, over 3873043.44 frames. ], batch size: 71, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:05:23,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=659720.0, ans=0.125 2024-08-10 17:05:24,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=659720.0, ans=0.0 2024-08-10 17:05:35,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=659820.0, ans=0.125 2024-08-10 17:05:45,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=659920.0, ans=0.125 2024-08-10 17:05:54,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=659920.0, ans=0.1 2024-08-10 17:05:55,807 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 2.761e+01 3.157e+01 3.536e+01 5.933e+01, threshold=6.314e+01, percent-clipped=0.0 2024-08-10 17:06:13,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=660120.0, ans=0.0 2024-08-10 17:06:21,857 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-10 17:06:25,386 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8050, loss[loss=0.1374, beats_loss=0.009695, ecapa_loss=0.0002522, whisper_loss=0.1252, over 18037.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01181, ecapa_loss=0.0002373, whisper_loss=0.09608, over 3874306.86 frames. ], batch size: 70, lr: 1.23e-02, grad_scale: 274877906944.0 2024-08-10 17:06:25,599 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 25 from LS+wenet, 7 from Vox, 22 fro AS 2024-08-10 17:06:41,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=660320.0, ans=0.2 2024-08-10 17:06:43,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=660320.0, ans=0.125 2024-08-10 17:06:45,334 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.17 vs. limit=22.5 2024-08-10 17:06:49,732 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-10 17:06:54,164 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.02 vs. limit=10.0 2024-08-10 17:07:15,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=660520.0, ans=0.0 2024-08-10 17:07:20,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=660620.0, ans=0.125 2024-08-10 17:07:23,116 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 17:07:24,501 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 16 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 17:07:24,833 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.431e-02 2024-08-10 17:07:32,451 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8100, loss[loss=0.09977, beats_loss=0.01295, ecapa_loss=0.0002335, whisper_loss=0.08448, over 16349.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01186, ecapa_loss=0.0002374, whisper_loss=0.095, over 3899493.12 frames. ], batch size: 67, lr: 1.23e-02, grad_scale: 274877906944.0 2024-08-10 17:07:35,962 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=22.5 2024-08-10 17:07:38,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=660720.0, ans=0.125 2024-08-10 17:07:42,308 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.593e-01 2024-08-10 17:07:48,780 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-10 17:07:50,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=660820.0, ans=0.0 2024-08-10 17:08:00,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=660920.0, ans=0.2 2024-08-10 17:08:01,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=660920.0, ans=0.125 2024-08-10 17:08:04,251 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 17:08:05,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=660920.0, ans=0.2 2024-08-10 17:08:09,455 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.280e+01 3.015e+01 3.254e+01 3.867e+01 1.141e+02, threshold=6.509e+01, percent-clipped=2.0 2024-08-10 17:08:20,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=661020.0, ans=0.125 2024-08-10 17:08:24,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=661120.0, ans=0.0 2024-08-10 17:08:28,137 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 17:08:31,077 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 17:08:34,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=661120.0, ans=0.0 2024-08-10 17:08:38,795 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8150, loss[loss=0.1048, beats_loss=0.01037, ecapa_loss=0.0002448, whisper_loss=0.09202, over 17195.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01179, ecapa_loss=0.0002392, whisper_loss=0.09487, over 3898827.26 frames. ], batch size: 66, lr: 1.23e-02, grad_scale: 274877906944.0 2024-08-10 17:09:04,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=661420.0, ans=0.04949747468305833 2024-08-10 17:09:21,727 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 17:09:43,780 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-10 17:09:44,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=661720.0, ans=0.125 2024-08-10 17:09:45,399 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8200, loss[loss=0.1045, beats_loss=0.01349, ecapa_loss=0.0002158, whisper_loss=0.08885, over 22749.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01178, ecapa_loss=0.00024, whisper_loss=0.09476, over 3909253.01 frames. ], batch size: 92, lr: 1.23e-02, grad_scale: 274877906944.0 2024-08-10 17:09:53,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=661720.0, ans=0.0 2024-08-10 17:09:54,242 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-10 17:09:56,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=661720.0, ans=0.0 2024-08-10 17:10:13,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=661920.0, ans=0.09899494936611666 2024-08-10 17:10:18,072 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 17:10:19,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=661920.0, ans=0.125 2024-08-10 17:10:21,902 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.877e+01 3.347e+01 3.681e+01 6.491e+01, threshold=6.694e+01, percent-clipped=0.0 2024-08-10 17:10:32,758 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.07 vs. limit=22.5 2024-08-10 17:10:35,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=662020.0, ans=0.125 2024-08-10 17:10:40,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=662120.0, ans=0.125 2024-08-10 17:10:50,931 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8250, loss[loss=0.0944, beats_loss=0.01167, ecapa_loss=0.0001885, whisper_loss=0.08085, over 18199.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01181, ecapa_loss=0.0002383, whisper_loss=0.09477, over 3912432.61 frames. ], batch size: 65, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:10:53,700 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 12 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 17:11:05,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=662320.0, ans=0.0 2024-08-10 17:11:06,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=662320.0, ans=0.0 2024-08-10 17:11:12,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=662320.0, ans=0.125 2024-08-10 17:11:13,877 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 27 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-10 17:11:24,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=662420.0, ans=0.2 2024-08-10 17:11:40,729 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 17:11:52,511 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 17:11:52,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=662620.0, ans=0.125 2024-08-10 17:11:56,345 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 17:11:57,492 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8300, loss[loss=0.1266, beats_loss=0.01088, ecapa_loss=0.0003036, whisper_loss=0.1126, over 19577.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.0119, ecapa_loss=0.0002363, whisper_loss=0.09458, over 3923310.29 frames. ], batch size: 80, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:12:00,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=662720.0, ans=0.2 2024-08-10 17:12:07,524 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 17:12:08,562 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 17:12:28,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=662920.0, ans=0.0 2024-08-10 17:12:34,972 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.231e+01 2.908e+01 3.363e+01 4.143e+01 6.461e+01, threshold=6.726e+01, percent-clipped=0.0 2024-08-10 17:12:54,764 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-10 17:12:59,162 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-10 17:13:04,089 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8350, loss[loss=0.1001, beats_loss=0.01137, ecapa_loss=0.0002746, whisper_loss=0.08601, over 21369.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01201, ecapa_loss=0.0002363, whisper_loss=0.09369, over 3890943.60 frames. ], batch size: 89, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:13:14,297 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 17:13:17,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=663320.0, ans=0.2 2024-08-10 17:13:20,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=663320.0, ans=0.1 2024-08-10 17:13:24,960 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=12.0 2024-08-10 17:13:27,347 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2024-08-10 17:13:28,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=663320.0, ans=0.2 2024-08-10 17:13:44,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=663420.0, ans=0.125 2024-08-10 17:13:52,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=663520.0, ans=0.2 2024-08-10 17:13:53,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=663520.0, ans=0.2 2024-08-10 17:13:57,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=663520.0, ans=0.125 2024-08-10 17:14:10,355 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 17:14:15,992 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8400, loss[loss=0.1141, beats_loss=0.01023, ecapa_loss=0.000237, whisper_loss=0.1015, over 20752.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01194, ecapa_loss=0.0002361, whisper_loss=0.09447, over 3893480.25 frames. ], batch size: 82, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:14:19,067 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 17:14:30,630 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 17:14:30,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=663820.0, ans=0.125 2024-08-10 17:14:48,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=663920.0, ans=0.0 2024-08-10 17:14:56,194 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.836e+01 3.172e+01 3.671e+01 5.154e+01, threshold=6.343e+01, percent-clipped=0.0 2024-08-10 17:15:12,709 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.66 vs. limit=22.5 2024-08-10 17:15:18,954 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 17:15:28,714 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-08-10 17:15:29,252 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8450, loss[loss=0.1281, beats_loss=0.009157, ecapa_loss=0.0002392, whisper_loss=0.1166, over 20336.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01184, ecapa_loss=0.0002362, whisper_loss=0.09424, over 3883835.20 frames. ], batch size: 80, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:15:35,131 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 17:15:37,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=664220.0, ans=0.2 2024-08-10 17:15:38,939 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-10 17:15:48,383 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.31 vs. limit=15.0 2024-08-10 17:15:55,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=664320.0, ans=0.125 2024-08-10 17:16:05,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=664420.0, ans=0.0 2024-08-10 17:16:09,095 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 17:16:14,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=664520.0, ans=0.1 2024-08-10 17:16:15,801 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 33 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 17:16:17,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=664520.0, ans=0.125 2024-08-10 17:16:19,901 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 28 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-10 17:16:20,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=664520.0, ans=0.125 2024-08-10 17:16:42,526 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8500, loss[loss=0.1341, beats_loss=0.01238, ecapa_loss=0.0002467, whisper_loss=0.1192, over 22485.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01179, ecapa_loss=0.0002367, whisper_loss=0.09516, over 3914267.26 frames. ], batch size: 89, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:16:44,939 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2024-08-10 17:17:17,188 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 17:17:26,711 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.856e+01 3.264e+01 3.786e+01 7.141e+01, threshold=6.528e+01, percent-clipped=1.0 2024-08-10 17:17:27,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=664920.0, ans=0.1 2024-08-10 17:17:28,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=665020.0, ans=0.5 2024-08-10 17:17:29,850 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 17:17:30,414 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=15.0 2024-08-10 17:17:43,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=665120.0, ans=0.1 2024-08-10 17:17:50,909 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 17:17:52,793 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 17:17:58,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=665220.0, ans=0.125 2024-08-10 17:18:00,057 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8550, loss[loss=0.1188, beats_loss=0.01148, ecapa_loss=0.0002894, whisper_loss=0.1044, over 22194.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01179, ecapa_loss=0.000237, whisper_loss=0.09494, over 3925658.34 frames. ], batch size: 90, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:18:03,977 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 17:18:06,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=665220.0, ans=12.0 2024-08-10 17:18:21,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=665320.0, ans=0.1 2024-08-10 17:18:21,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=665320.0, ans=0.125 2024-08-10 17:18:31,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=665420.0, ans=0.125 2024-08-10 17:18:35,090 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-10 17:18:37,012 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 17:18:47,210 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.30 vs. limit=10.0 2024-08-10 17:18:49,892 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.49 vs. limit=10.0 2024-08-10 17:18:54,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=665520.0, ans=0.125 2024-08-10 17:19:16,587 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8600, loss[loss=0.108, beats_loss=0.01385, ecapa_loss=0.0002227, whisper_loss=0.09189, over 19118.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01178, ecapa_loss=0.0002363, whisper_loss=0.09499, over 3893343.82 frames. ], batch size: 81, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:19:17,480 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.79 vs. limit=15.0 2024-08-10 17:19:23,907 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.46 vs. limit=6.0 2024-08-10 17:19:33,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=665820.0, ans=0.2 2024-08-10 17:19:38,402 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-10 17:19:43,482 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=22.5 2024-08-10 17:19:51,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=665920.0, ans=0.125 2024-08-10 17:19:54,034 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.57 vs. limit=10.0 2024-08-10 17:19:58,528 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=22.5 2024-08-10 17:20:05,145 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.897e+01 3.260e+01 3.635e+01 5.528e+01, threshold=6.520e+01, percent-clipped=0.0 2024-08-10 17:20:38,464 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.28 vs. limit=15.0 2024-08-10 17:20:41,452 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8650, loss[loss=0.1281, beats_loss=0.009248, ecapa_loss=0.0002264, whisper_loss=0.1166, over 21804.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01185, ecapa_loss=0.0002362, whisper_loss=0.09531, over 3912142.59 frames. ], batch size: 84, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:20:50,023 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-10 17:21:49,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=666520.0, ans=0.035 2024-08-10 17:22:14,154 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8700, loss[loss=0.1258, beats_loss=0.01134, ecapa_loss=0.0002065, whisper_loss=0.1124, over 16940.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01184, ecapa_loss=0.0002372, whisper_loss=0.09507, over 3902759.59 frames. ], batch size: 63, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:22:25,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=666720.0, ans=0.125 2024-08-10 17:22:29,539 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 17:22:55,989 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 17:23:16,588 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.246e+01 2.997e+01 3.503e+01 4.044e+01 1.535e+02, threshold=7.007e+01, percent-clipped=1.0 2024-08-10 17:23:27,312 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=12.0 2024-08-10 17:23:48,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=667120.0, ans=0.0 2024-08-10 17:23:52,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=667120.0, ans=0.125 2024-08-10 17:23:57,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=667220.0, ans=0.125 2024-08-10 17:23:58,396 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8750, loss[loss=0.09981, beats_loss=0.01087, ecapa_loss=0.0002772, whisper_loss=0.08617, over 16258.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01183, ecapa_loss=0.0002382, whisper_loss=0.09496, over 3882970.55 frames. ], batch size: 69, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:24:13,083 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 17:24:18,316 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-10 17:24:42,102 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 17 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 17:25:12,198 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-10 17:25:24,169 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 17:25:57,736 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8800, loss[loss=0.1058, beats_loss=0.014, ecapa_loss=0.0002239, whisper_loss=0.0896, over 23211.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01195, ecapa_loss=0.0002383, whisper_loss=0.09477, over 3925166.95 frames. ], batch size: 93, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:26:15,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=667720.0, ans=0.125 2024-08-10 17:26:24,544 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 17:26:28,677 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 24 from LS+wenet, 35 from Vox, 36 fro AS 2024-08-10 17:27:04,341 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.842e+01 3.119e+01 3.570e+01 8.103e+01, threshold=6.239e+01, percent-clipped=1.0 2024-08-10 17:27:19,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=668020.0, ans=0.0 2024-08-10 17:28:03,280 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8850, loss[loss=0.1079, beats_loss=0.01184, ecapa_loss=0.0002029, whisper_loss=0.09408, over 18845.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01205, ecapa_loss=0.0002348, whisper_loss=0.09446, over 3921610.53 frames. ], batch size: 74, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:28:03,462 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-10 17:28:07,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=668220.0, ans=10.0 2024-08-10 17:28:11,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=668220.0, ans=0.125 2024-08-10 17:28:15,302 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 17:29:33,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=668520.0, ans=0.125 2024-08-10 17:29:38,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=668620.0, ans=0.0 2024-08-10 17:29:46,435 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-10 17:29:47,618 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-10 17:29:53,700 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8900, loss[loss=0.09794, beats_loss=0.0129, ecapa_loss=0.0002498, whisper_loss=0.08254, over 20399.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01201, ecapa_loss=0.0002353, whisper_loss=0.09437, over 3911759.87 frames. ], batch size: 85, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:29:56,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=668720.0, ans=0.1 2024-08-10 17:29:59,881 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-10 17:30:04,499 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 17:30:14,006 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.22 vs. limit=6.0 2024-08-10 17:30:26,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=668920.0, ans=0.0 2024-08-10 17:30:29,866 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.73 vs. limit=10.0 2024-08-10 17:30:32,887 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.82 vs. limit=12.0 2024-08-10 17:30:36,287 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.696e+01 3.082e+01 3.587e+01 7.840e+01, threshold=6.164e+01, percent-clipped=1.0 2024-08-10 17:30:41,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=669020.0, ans=0.125 2024-08-10 17:30:58,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=669120.0, ans=0.125 2024-08-10 17:30:59,210 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2024-08-10 17:31:03,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=669120.0, ans=0.1 2024-08-10 17:31:04,796 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 17:31:08,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=669120.0, ans=0.04949747468305833 2024-08-10 17:31:11,417 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 8950, loss[loss=0.0925, beats_loss=0.01082, ecapa_loss=0.000265, whisper_loss=0.07903, over 20938.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01196, ecapa_loss=0.0002367, whisper_loss=0.0942, over 3916122.29 frames. ], batch size: 87, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:31:12,904 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 17:31:21,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=669220.0, ans=0.125 2024-08-10 17:31:24,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=669220.0, ans=0.0 2024-08-10 17:32:20,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=669620.0, ans=0.1 2024-08-10 17:32:28,708 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9000, loss[loss=0.1084, beats_loss=0.01179, ecapa_loss=0.0002699, whisper_loss=0.09395, over 20380.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01189, ecapa_loss=0.0002363, whisper_loss=0.09459, over 3895316.03 frames. ], batch size: 85, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:32:28,709 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-10 17:32:53,770 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.0222, 1.7534, 2.5875, 1.6487, 3.0917, 3.0262, 2.3152, 2.1820], device='cuda:2') 2024-08-10 17:33:04,112 INFO [train_multi_KD3.py:1149] (2/4) Epoch 5, validation on ASR_libri: loss=0.2625, beats_loss=0, ecapa_loss=0.0007367, whisper_loss=0.2551, over 922467.00 frames. 2024-08-10 17:33:20,353 INFO [train_multi_KD3.py:1149] (2/4) Epoch 5, validation on SV_voxceleb1: loss=0.006282, beats_loss=0, ecapa_loss=0.0006282, whisper_loss=0, over 939242.00 frames. 2024-08-10 17:34:29,067 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.3141, 1.2419, 1.2062, 0.9006, 0.7290, 1.2145, 1.3151, 0.8258], device='cuda:2') 2024-08-10 17:35:00,387 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.2962, 2.6526, 2.2248, 2.4687], device='cuda:2') 2024-08-10 17:35:05,200 INFO [train_multi_KD3.py:1149] (2/4) Epoch 5, validation on AT_audioset: loss=0.02673, beats_loss=0.02673, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 17:35:05,204 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 17:35:05,962 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.58 vs. limit=15.0 2024-08-10 17:35:10,756 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 17:35:14,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=669720.0, ans=0.125 2024-08-10 17:35:47,862 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.797e+01 3.113e+01 3.593e+01 8.640e+01, threshold=6.226e+01, percent-clipped=2.0 2024-08-10 17:35:59,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=670020.0, ans=0.125 2024-08-10 17:36:04,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=670020.0, ans=0.125 2024-08-10 17:36:19,451 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2024-08-10 17:36:21,162 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9050, loss[loss=0.1055, beats_loss=0.01184, ecapa_loss=0.0002705, whisper_loss=0.09098, over 20963.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01194, ecapa_loss=0.0002357, whisper_loss=0.0943, over 3896348.23 frames. ], batch size: 90, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:36:30,851 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 17:36:32,245 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 17:36:33,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=670220.0, ans=0.0 2024-08-10 17:36:35,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=670320.0, ans=0.2 2024-08-10 17:36:43,881 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 37 from LS+wenet, 29 from Vox, 25 fro AS 2024-08-10 17:36:46,455 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 17:37:11,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=670520.0, ans=0.2 2024-08-10 17:37:15,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=670520.0, ans=0.125 2024-08-10 17:37:17,665 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-10 17:37:27,192 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 17:37:29,416 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=22.5 2024-08-10 17:37:35,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.64 vs. limit=12.0 2024-08-10 17:37:35,517 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9100, loss[loss=0.1028, beats_loss=0.01236, ecapa_loss=0.0002315, whisper_loss=0.0881, over 18667.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01178, ecapa_loss=0.0002383, whisper_loss=0.09489, over 3881715.26 frames. ], batch size: 75, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:37:35,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=670720.0, ans=0.04949747468305833 2024-08-10 17:37:43,931 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-10 17:37:45,576 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 33 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 17:37:46,875 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=12.0 2024-08-10 17:38:06,260 INFO [train_multi_KD3.py:844] (2/4) A total of 97 cuts. 18 from LS+wenet, 37 from Vox, 42 fro AS 2024-08-10 17:38:12,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=670920.0, ans=0.0 2024-08-10 17:38:16,531 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.910e+01 3.253e+01 3.723e+01 6.048e+01, threshold=6.507e+01, percent-clipped=0.0 2024-08-10 17:38:21,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=671020.0, ans=0.125 2024-08-10 17:38:22,006 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.93 vs. limit=22.5 2024-08-10 17:38:37,221 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-10 17:38:37,531 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.656e-01 2024-08-10 17:38:49,179 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9150, loss[loss=0.08832, beats_loss=0.01324, ecapa_loss=0.0002388, whisper_loss=0.0727, over 15138.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01181, ecapa_loss=0.0002381, whisper_loss=0.09479, over 3917478.82 frames. ], batch size: 63, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:39:26,035 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 17:39:31,251 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 17:39:41,602 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.959e-01 2024-08-10 17:39:54,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=671620.0, ans=0.125 2024-08-10 17:40:01,757 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 17:40:09,791 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9200, loss[loss=0.08877, beats_loss=0.01246, ecapa_loss=0.0002694, whisper_loss=0.07362, over 18797.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01176, ecapa_loss=0.0002389, whisper_loss=0.09517, over 3896699.82 frames. ], batch size: 79, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:40:10,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=671720.0, ans=0.1 2024-08-10 17:40:33,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=671820.0, ans=0.0 2024-08-10 17:40:34,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=671820.0, ans=0.125 2024-08-10 17:40:37,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=671820.0, ans=0.1 2024-08-10 17:40:37,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=671820.0, ans=0.1 2024-08-10 17:40:46,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=671920.0, ans=0.2 2024-08-10 17:40:53,437 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.732e+01 3.100e+01 3.483e+01 6.432e+01, threshold=6.200e+01, percent-clipped=0.0 2024-08-10 17:41:01,840 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.46 vs. limit=15.0 2024-08-10 17:41:27,119 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9250, loss[loss=0.1239, beats_loss=0.009691, ecapa_loss=0.0002951, whisper_loss=0.1113, over 21298.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01181, ecapa_loss=0.0002367, whisper_loss=0.09519, over 3912411.04 frames. ], batch size: 89, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:41:27,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=672220.0, ans=0.0 2024-08-10 17:41:36,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=672220.0, ans=0.2 2024-08-10 17:42:06,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.69 vs. limit=15.0 2024-08-10 17:42:07,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=672420.0, ans=0.125 2024-08-10 17:42:13,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=672520.0, ans=0.04949747468305833 2024-08-10 17:42:15,861 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 17:42:16,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=672520.0, ans=0.1 2024-08-10 17:42:23,270 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 37 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-10 17:42:23,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=672520.0, ans=0.2 2024-08-10 17:42:29,267 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 33 from LS+wenet, 9 from Vox, 36 fro AS 2024-08-10 17:42:33,408 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-10 17:42:37,100 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-10 17:42:42,569 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9300, loss[loss=0.104, beats_loss=0.01173, ecapa_loss=0.0002388, whisper_loss=0.08992, over 22227.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01181, ecapa_loss=0.000236, whisper_loss=0.09547, over 3927380.48 frames. ], batch size: 86, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:42:46,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=672720.0, ans=15.0 2024-08-10 17:43:05,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=672820.0, ans=0.0 2024-08-10 17:43:28,103 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+01 2.985e+01 3.331e+01 3.923e+01 7.099e+01, threshold=6.662e+01, percent-clipped=2.0 2024-08-10 17:43:30,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=673020.0, ans=0.0 2024-08-10 17:43:34,515 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-08-10 17:43:40,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=673020.0, ans=0.125 2024-08-10 17:43:43,578 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 17:43:49,236 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 36 from Vox, 33 fro AS 2024-08-10 17:43:59,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=673120.0, ans=0.2 2024-08-10 17:44:00,699 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2024-08-10 17:44:01,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=673120.0, ans=0.0 2024-08-10 17:44:05,408 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9350, loss[loss=0.1205, beats_loss=0.01074, ecapa_loss=0.0002624, whisper_loss=0.1071, over 18004.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01184, ecapa_loss=0.0002352, whisper_loss=0.09523, over 3905996.35 frames. ], batch size: 77, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:44:32,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=673320.0, ans=0.1 2024-08-10 17:44:33,580 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.716e-01 2024-08-10 17:45:00,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=673520.0, ans=0.1 2024-08-10 17:45:06,486 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.10 vs. limit=15.0 2024-08-10 17:45:07,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=673620.0, ans=0.1 2024-08-10 17:45:10,743 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=23.74 vs. limit=15.0 2024-08-10 17:45:11,932 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-10 17:45:22,860 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9400, loss[loss=0.1129, beats_loss=0.009623, ecapa_loss=0.0003009, whisper_loss=0.1003, over 17443.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01183, ecapa_loss=0.0002346, whisper_loss=0.09521, over 3900132.59 frames. ], batch size: 71, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:45:30,336 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 18 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 17:45:34,767 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 34 from Vox, 31 fro AS 2024-08-10 17:45:35,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=673720.0, ans=0.1 2024-08-10 17:45:50,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=673820.0, ans=0.125 2024-08-10 17:45:55,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=673920.0, ans=0.0 2024-08-10 17:46:05,060 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.795e+01 3.115e+01 3.725e+01 7.083e+01, threshold=6.231e+01, percent-clipped=1.0 2024-08-10 17:46:15,826 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 17:46:28,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=674120.0, ans=0.125 2024-08-10 17:46:36,831 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9450, loss[loss=0.1261, beats_loss=0.0114, ecapa_loss=0.0002433, whisper_loss=0.1123, over 23795.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01197, ecapa_loss=0.0002352, whisper_loss=0.09434, over 3890644.99 frames. ], batch size: 94, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:46:37,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=674220.0, ans=0.035 2024-08-10 17:46:40,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=674220.0, ans=0.125 2024-08-10 17:46:52,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=674320.0, ans=0.0 2024-08-10 17:46:52,994 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-10 17:46:53,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=674320.0, ans=0.125 2024-08-10 17:46:57,036 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2024-08-10 17:47:00,249 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 17:47:00,866 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2024-08-10 17:47:06,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=674420.0, ans=0.125 2024-08-10 17:47:11,418 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 29 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-10 17:47:31,779 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 17:47:33,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=674620.0, ans=0.015 2024-08-10 17:47:37,075 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 17:47:40,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=674620.0, ans=0.125 2024-08-10 17:47:41,431 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-10 17:47:43,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=674620.0, ans=0.1 2024-08-10 17:47:48,970 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9500, loss[loss=0.1195, beats_loss=0.0102, ecapa_loss=0.0002421, whisper_loss=0.1069, over 22712.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01186, ecapa_loss=0.0002381, whisper_loss=0.09447, over 3874052.30 frames. ], batch size: 89, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:48:15,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=674820.0, ans=0.125 2024-08-10 17:48:20,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=674920.0, ans=0.125 2024-08-10 17:48:21,941 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 13 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-10 17:48:32,162 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 17:48:33,272 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.877e+01 3.250e+01 3.723e+01 7.953e+01, threshold=6.499e+01, percent-clipped=3.0 2024-08-10 17:48:33,553 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 17:48:42,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=675020.0, ans=0.125 2024-08-10 17:48:47,614 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-10 17:49:05,597 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9550, loss[loss=0.1142, beats_loss=0.01143, ecapa_loss=0.0002265, whisper_loss=0.1005, over 16477.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01193, ecapa_loss=0.0002382, whisper_loss=0.09397, over 3859665.45 frames. ], batch size: 63, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:49:20,185 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 20 from LS+wenet, 23 from Vox, 52 fro AS 2024-08-10 17:49:21,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=675320.0, ans=0.0 2024-08-10 17:49:32,166 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 14 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-10 17:49:37,603 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.91 vs. limit=22.5 2024-08-10 17:49:46,379 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-10 17:49:59,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675520.0, ans=0.1 2024-08-10 17:50:17,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=675620.0, ans=0.2 2024-08-10 17:50:21,445 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9600, loss[loss=0.1117, beats_loss=0.009279, ecapa_loss=0.0002643, whisper_loss=0.09976, over 21624.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01199, ecapa_loss=0.0002374, whisper_loss=0.0931, over 3852999.01 frames. ], batch size: 89, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:50:28,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=675720.0, ans=0.0 2024-08-10 17:50:39,120 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 17:51:02,246 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 2.698e+01 2.997e+01 3.348e+01 4.884e+01, threshold=5.995e+01, percent-clipped=0.0 2024-08-10 17:51:08,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=676020.0, ans=0.125 2024-08-10 17:51:22,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=676120.0, ans=0.125 2024-08-10 17:51:32,455 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9650, loss[loss=0.1228, beats_loss=0.01081, ecapa_loss=0.0002564, whisper_loss=0.1094, over 17970.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01188, ecapa_loss=0.0002387, whisper_loss=0.09397, over 3819372.47 frames. ], batch size: 74, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:51:44,311 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 17:51:47,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=676320.0, ans=0.125 2024-08-10 17:52:12,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=676420.0, ans=0.0 2024-08-10 17:52:25,147 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 17:52:25,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=676520.0, ans=0.0 2024-08-10 17:52:27,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=676520.0, ans=0.125 2024-08-10 17:52:45,188 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9700, loss[loss=0.1377, beats_loss=0.009927, ecapa_loss=0.0002308, whisper_loss=0.1255, over 23846.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01177, ecapa_loss=0.000239, whisper_loss=0.09487, over 3842436.36 frames. ], batch size: 92, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:52:54,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=676720.0, ans=0.0 2024-08-10 17:53:02,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=676820.0, ans=0.125 2024-08-10 17:53:03,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=676820.0, ans=0.2 2024-08-10 17:53:16,225 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 21 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-10 17:53:27,453 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.851e+01 3.065e+01 3.509e+01 5.015e+01, threshold=6.131e+01, percent-clipped=0.0 2024-08-10 17:53:33,415 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 17:53:33,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=677020.0, ans=0.125 2024-08-10 17:53:33,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=677020.0, ans=0.0 2024-08-10 17:53:35,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=677020.0, ans=0.1 2024-08-10 17:53:53,099 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=4.367e-02 2024-08-10 17:53:59,573 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9750, loss[loss=0.09802, beats_loss=0.01569, ecapa_loss=0.0001859, whisper_loss=0.08047, over 21580.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01177, ecapa_loss=0.0002378, whisper_loss=0.09488, over 3830018.22 frames. ], batch size: 89, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:54:11,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=677220.0, ans=0.1 2024-08-10 17:54:17,773 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.84 vs. limit=12.0 2024-08-10 17:54:20,437 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 17:54:25,596 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2024-08-10 17:54:43,501 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=22.5 2024-08-10 17:54:45,910 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 12 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 17:54:51,966 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 17:54:52,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=677520.0, ans=0.05 2024-08-10 17:54:54,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=677520.0, ans=0.1 2024-08-10 17:55:00,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=677620.0, ans=0.125 2024-08-10 17:55:10,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=677620.0, ans=0.125 2024-08-10 17:55:10,711 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.77 vs. limit=22.5 2024-08-10 17:55:12,954 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9800, loss[loss=0.1115, beats_loss=0.009876, ecapa_loss=0.0002762, whisper_loss=0.09883, over 22879.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01171, ecapa_loss=0.0002371, whisper_loss=0.09565, over 3842312.73 frames. ], batch size: 93, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:55:22,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=677720.0, ans=0.125 2024-08-10 17:55:51,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=677920.0, ans=0.125 2024-08-10 17:55:54,550 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.700e+01 3.065e+01 3.596e+01 6.450e+01, threshold=6.130e+01, percent-clipped=1.0 2024-08-10 17:55:55,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=677920.0, ans=0.0 2024-08-10 17:56:02,545 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 17:56:16,025 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 17:56:22,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=678120.0, ans=0.0 2024-08-10 17:56:26,143 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9850, loss[loss=0.1156, beats_loss=0.01164, ecapa_loss=0.0003164, whisper_loss=0.1008, over 21937.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01175, ecapa_loss=0.0002377, whisper_loss=0.0951, over 3841576.25 frames. ], batch size: 93, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:56:30,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=678220.0, ans=0.0 2024-08-10 17:56:41,818 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 17:56:53,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=678320.0, ans=0.125 2024-08-10 17:56:57,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=678420.0, ans=0.125 2024-08-10 17:57:04,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=678420.0, ans=0.2 2024-08-10 17:57:10,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=678520.0, ans=0.0 2024-08-10 17:57:23,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=678520.0, ans=0.2 2024-08-10 17:57:28,329 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 17:57:35,728 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 33 from Vox, 28 fro AS 2024-08-10 17:57:35,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=678620.0, ans=0.2 2024-08-10 17:57:40,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=678720.0, ans=0.1 2024-08-10 17:57:41,317 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9900, loss[loss=0.1259, beats_loss=0.008925, ecapa_loss=0.0002602, whisper_loss=0.1144, over 22409.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01175, ecapa_loss=0.0002377, whisper_loss=0.09564, over 3878235.30 frames. ], batch size: 89, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:57:51,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=678720.0, ans=0.0 2024-08-10 17:58:02,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=678820.0, ans=0.0 2024-08-10 17:58:05,287 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 17:58:19,702 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.724e+01 3.027e+01 3.695e+01 5.994e+01, threshold=6.053e+01, percent-clipped=0.0 2024-08-10 17:58:28,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=679020.0, ans=0.125 2024-08-10 17:58:33,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=679020.0, ans=0.1 2024-08-10 17:58:46,118 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 17:58:47,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=679120.0, ans=0.015 2024-08-10 17:58:50,504 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 9950, loss[loss=0.1194, beats_loss=0.0134, ecapa_loss=0.0002111, whisper_loss=0.1039, over 23548.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01174, ecapa_loss=0.0002364, whisper_loss=0.09544, over 3887223.36 frames. ], batch size: 93, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:58:58,653 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 17:59:25,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=679420.0, ans=0.0 2024-08-10 17:59:28,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=679420.0, ans=0.0 2024-08-10 17:59:51,748 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 32 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-10 17:59:55,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=679620.0, ans=0.125 2024-08-10 18:00:04,317 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10000, loss[loss=0.1183, beats_loss=0.01117, ecapa_loss=0.0002536, whisper_loss=0.1046, over 22589.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01172, ecapa_loss=0.000237, whisper_loss=0.09555, over 3889301.06 frames. ], batch size: 92, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 18:00:07,635 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 18:00:47,351 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.257e+01 2.793e+01 3.118e+01 3.876e+01 5.816e+01, threshold=6.237e+01, percent-clipped=0.0 2024-08-10 18:00:53,280 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2024-08-10 18:00:54,177 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-10 18:00:54,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=680020.0, ans=0.1 2024-08-10 18:01:04,423 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 24 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 18:01:09,877 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-10 18:01:13,635 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 16 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-10 18:01:17,651 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10050, loss[loss=0.09252, beats_loss=0.01181, ecapa_loss=0.0002223, whisper_loss=0.07848, over 15657.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01169, ecapa_loss=0.000238, whisper_loss=0.09547, over 3885370.14 frames. ], batch size: 61, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:01:18,815 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.85 vs. limit=10.0 2024-08-10 18:01:46,931 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 18:01:52,842 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 18:01:59,288 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 24 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-10 18:02:07,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=680520.0, ans=0.04949747468305833 2024-08-10 18:02:08,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=680520.0, ans=0.125 2024-08-10 18:02:20,106 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 18:02:30,651 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10100, loss[loss=0.08464, beats_loss=0.01494, ecapa_loss=0.0002058, whisper_loss=0.06765, over 19980.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01178, ecapa_loss=0.0002378, whisper_loss=0.09516, over 3915558.32 frames. ], batch size: 82, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:02:39,812 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 18:02:42,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=680720.0, ans=0.125 2024-08-10 18:02:56,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=680820.0, ans=0.1 2024-08-10 18:02:58,867 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 18:03:11,257 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.354e+03 2024-08-10 18:03:12,573 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.243e+01 2.905e+01 3.182e+01 3.646e+01 5.979e+01, threshold=6.363e+01, percent-clipped=0.0 2024-08-10 18:03:16,566 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.44 vs. limit=15.0 2024-08-10 18:03:17,275 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 25 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-10 18:03:21,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=681020.0, ans=0.0 2024-08-10 18:03:23,007 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 18:03:33,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.06 vs. limit=15.0 2024-08-10 18:03:48,113 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10150, loss[loss=0.115, beats_loss=0.01035, ecapa_loss=0.0002487, whisper_loss=0.1021, over 22060.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01169, ecapa_loss=0.0002395, whisper_loss=0.09494, over 3913346.57 frames. ], batch size: 88, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:03:48,273 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 18:04:07,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=681320.0, ans=0.125 2024-08-10 18:04:11,696 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-10 18:04:14,262 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.74 vs. limit=22.5 2024-08-10 18:04:29,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=681420.0, ans=0.0 2024-08-10 18:04:31,943 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-10 18:04:32,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=681420.0, ans=0.05 2024-08-10 18:04:56,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=681620.0, ans=0.035 2024-08-10 18:04:57,177 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 18:05:09,526 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10200, loss[loss=0.1125, beats_loss=0.01038, ecapa_loss=0.0002497, whisper_loss=0.09961, over 18286.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01173, ecapa_loss=0.0002378, whisper_loss=0.09497, over 3897704.03 frames. ], batch size: 72, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:05:09,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=681720.0, ans=0.125 2024-08-10 18:05:35,244 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 19 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-10 18:05:47,639 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 18:05:54,502 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.166e+01 2.852e+01 3.122e+01 3.821e+01 7.643e+01, threshold=6.244e+01, percent-clipped=3.0 2024-08-10 18:06:13,984 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-10 18:06:15,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=682120.0, ans=0.125 2024-08-10 18:06:28,075 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10250, loss[loss=0.129, beats_loss=0.008518, ecapa_loss=0.0003129, whisper_loss=0.1173, over 18056.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01164, ecapa_loss=0.0002379, whisper_loss=0.09564, over 3908612.24 frames. ], batch size: 73, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:06:34,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=682220.0, ans=0.0 2024-08-10 18:06:37,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=682220.0, ans=0.125 2024-08-10 18:06:40,218 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 13 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 18:06:50,879 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-10 18:06:54,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=682320.0, ans=0.0 2024-08-10 18:07:01,155 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-10 18:07:07,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=682420.0, ans=0.125 2024-08-10 18:07:09,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=682420.0, ans=0.0 2024-08-10 18:07:46,374 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10300, loss[loss=0.09629, beats_loss=0.01372, ecapa_loss=0.0002629, whisper_loss=0.07994, over 21213.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01173, ecapa_loss=0.0002363, whisper_loss=0.09505, over 3883245.96 frames. ], batch size: 92, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:07:59,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=682720.0, ans=0.125 2024-08-10 18:08:19,728 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.09 vs. limit=22.5 2024-08-10 18:08:28,008 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 18:08:29,580 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.976e+01 3.282e+01 3.794e+01 5.948e+01, threshold=6.564e+01, percent-clipped=0.0 2024-08-10 18:08:37,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=683020.0, ans=0.04949747468305833 2024-08-10 18:08:41,727 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-10 18:08:52,187 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 18:08:52,816 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=12.0 2024-08-10 18:09:02,178 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10350, loss[loss=0.1034, beats_loss=0.008923, ecapa_loss=0.0002828, whisper_loss=0.09161, over 18376.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01169, ecapa_loss=0.0002358, whisper_loss=0.09544, over 3898481.06 frames. ], batch size: 74, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:09:20,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=683320.0, ans=0.2 2024-08-10 18:09:38,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=683420.0, ans=0.0 2024-08-10 18:09:38,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=683420.0, ans=0.125 2024-08-10 18:09:40,929 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2024-08-10 18:09:54,578 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 18:10:01,067 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 23 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-10 18:10:04,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=683620.0, ans=0.125 2024-08-10 18:10:13,675 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 18:10:14,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=683620.0, ans=0.125 2024-08-10 18:10:20,459 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10400, loss[loss=0.1181, beats_loss=0.01165, ecapa_loss=0.0002191, whisper_loss=0.1043, over 23105.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01174, ecapa_loss=0.0002354, whisper_loss=0.09483, over 3882816.92 frames. ], batch size: 92, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:10:22,246 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-10 18:10:25,459 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 18:10:46,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=683820.0, ans=0.2 2024-08-10 18:10:47,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=683820.0, ans=0.2 2024-08-10 18:10:48,936 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-10 18:10:56,887 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 18:10:57,812 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=12.0 2024-08-10 18:11:02,501 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 2.804e+01 3.184e+01 3.674e+01 7.007e+01, threshold=6.369e+01, percent-clipped=1.0 2024-08-10 18:11:03,998 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 18:11:09,684 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 18:11:11,385 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 18:11:13,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=684020.0, ans=0.125 2024-08-10 18:11:23,590 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 18 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-10 18:11:31,704 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.55 vs. limit=8.0 2024-08-10 18:11:34,728 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10450, loss[loss=0.07377, beats_loss=0.01741, ecapa_loss=0.0002341, whisper_loss=0.05401, over 14148.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01183, ecapa_loss=0.0002362, whisper_loss=0.09422, over 3859240.35 frames. ], batch size: 61, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:11:35,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=684220.0, ans=0.125 2024-08-10 18:11:35,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=684220.0, ans=0.0 2024-08-10 18:11:41,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=684220.0, ans=0.125 2024-08-10 18:11:49,677 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 29 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-10 18:11:55,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=684320.0, ans=0.125 2024-08-10 18:12:03,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=684320.0, ans=0.2 2024-08-10 18:12:06,548 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 18:12:17,057 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 20 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-10 18:12:17,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=684420.0, ans=0.125 2024-08-10 18:12:20,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=684420.0, ans=0.0 2024-08-10 18:12:25,276 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 18:12:28,753 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.56 vs. limit=22.5 2024-08-10 18:12:46,811 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 18:12:54,236 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10500, loss[loss=0.08474, beats_loss=0.01505, ecapa_loss=0.0002105, whisper_loss=0.06759, over 18262.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01176, ecapa_loss=0.0002353, whisper_loss=0.09503, over 3873735.04 frames. ], batch size: 75, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:13:09,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=684820.0, ans=0.125 2024-08-10 18:13:11,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=684820.0, ans=0.125 2024-08-10 18:13:15,978 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 18:13:16,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=684820.0, ans=0.125 2024-08-10 18:13:35,859 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.920e+01 3.144e+01 3.815e+01 6.100e+01, threshold=6.288e+01, percent-clipped=0.0 2024-08-10 18:14:02,823 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.37 vs. limit=15.0 2024-08-10 18:14:09,634 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10550, loss[loss=0.1043, beats_loss=0.01347, ecapa_loss=0.0002508, whisper_loss=0.08829, over 22609.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01172, ecapa_loss=0.0002351, whisper_loss=0.09524, over 3852011.16 frames. ], batch size: 92, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:14:10,116 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.041e-02 2024-08-10 18:14:25,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=685320.0, ans=0.125 2024-08-10 18:14:25,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=685320.0, ans=0.0 2024-08-10 18:14:34,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=685320.0, ans=0.125 2024-08-10 18:14:54,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=685420.0, ans=0.1 2024-08-10 18:15:05,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=685520.0, ans=0.05 2024-08-10 18:15:05,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=685520.0, ans=0.125 2024-08-10 18:15:14,651 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 18:15:21,872 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 37 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 18:15:28,776 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10600, loss[loss=0.1074, beats_loss=0.01411, ecapa_loss=0.0002123, whisper_loss=0.09112, over 20372.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01173, ecapa_loss=0.0002363, whisper_loss=0.09507, over 3834597.08 frames. ], batch size: 84, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:15:43,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=685820.0, ans=0.125 2024-08-10 18:15:43,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=685820.0, ans=0.125 2024-08-10 18:15:46,413 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-10 18:15:49,426 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 18:15:59,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.27 vs. limit=15.0 2024-08-10 18:16:01,851 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-10 18:16:03,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=685920.0, ans=0.125 2024-08-10 18:16:04,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=685920.0, ans=0.0 2024-08-10 18:16:07,858 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 18:16:12,068 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.764e+01 3.108e+01 3.489e+01 4.887e+01, threshold=6.215e+01, percent-clipped=0.0 2024-08-10 18:16:20,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=686020.0, ans=0.125 2024-08-10 18:16:25,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686020.0, ans=0.1 2024-08-10 18:16:46,479 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10650, loss[loss=0.09176, beats_loss=0.01172, ecapa_loss=0.0002093, whisper_loss=0.07794, over 21810.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01173, ecapa_loss=0.0002352, whisper_loss=0.09552, over 3845258.58 frames. ], batch size: 86, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:16:49,684 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 18:16:50,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=686220.0, ans=0.125 2024-08-10 18:16:54,957 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 18:17:11,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=686320.0, ans=0.125 2024-08-10 18:17:15,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=686320.0, ans=0.125 2024-08-10 18:17:41,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=686520.0, ans=0.2 2024-08-10 18:17:43,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686520.0, ans=0.1 2024-08-10 18:17:52,160 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 30 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 18:18:04,524 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10700, loss[loss=0.1044, beats_loss=0.01007, ecapa_loss=0.0002247, whisper_loss=0.09206, over 17575.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01169, ecapa_loss=0.0002321, whisper_loss=0.09619, over 3896095.99 frames. ], batch size: 70, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:18:09,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=686720.0, ans=0.025 2024-08-10 18:18:25,188 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.63 vs. limit=6.0 2024-08-10 18:18:30,072 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.33 vs. limit=6.0 2024-08-10 18:18:37,232 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 18:18:47,243 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.883e+01 3.231e+01 3.765e+01 5.379e+01, threshold=6.463e+01, percent-clipped=0.0 2024-08-10 18:18:53,179 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-10 18:19:04,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.98 vs. limit=22.5 2024-08-10 18:19:07,537 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-10 18:19:07,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=687120.0, ans=0.0 2024-08-10 18:19:09,365 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 18:19:23,248 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10750, loss[loss=0.1083, beats_loss=0.01324, ecapa_loss=0.0002352, whisper_loss=0.09267, over 22860.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01171, ecapa_loss=0.0002328, whisper_loss=0.09574, over 3915918.69 frames. ], batch size: 95, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:19:23,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=687220.0, ans=0.125 2024-08-10 18:19:29,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=687220.0, ans=0.125 2024-08-10 18:19:42,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=687320.0, ans=0.0 2024-08-10 18:19:51,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=687320.0, ans=0.125 2024-08-10 18:20:04,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=687420.0, ans=0.0 2024-08-10 18:20:08,385 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 18:20:18,521 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.41 vs. limit=15.0 2024-08-10 18:20:31,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=687620.0, ans=0.0 2024-08-10 18:20:33,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=687620.0, ans=0.125 2024-08-10 18:20:35,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=687620.0, ans=0.2 2024-08-10 18:20:37,033 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-10 18:20:40,705 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10800, loss[loss=0.09498, beats_loss=0.01284, ecapa_loss=0.000308, whisper_loss=0.07906, over 15106.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01183, ecapa_loss=0.0002329, whisper_loss=0.09488, over 3896926.68 frames. ], batch size: 66, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:20:52,872 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.828e-02 2024-08-10 18:21:05,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=687820.0, ans=0.125 2024-08-10 18:21:11,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=687920.0, ans=0.5 2024-08-10 18:21:23,502 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.324e+01 2.760e+01 3.130e+01 3.473e+01 5.037e+01, threshold=6.260e+01, percent-clipped=0.0 2024-08-10 18:21:30,685 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 18 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 18:21:47,782 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-10 18:21:57,332 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10850, loss[loss=0.1182, beats_loss=0.01229, ecapa_loss=0.0002508, whisper_loss=0.1034, over 21570.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01184, ecapa_loss=0.000233, whisper_loss=0.09484, over 3897923.29 frames. ], batch size: 90, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:21:59,414 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 18:22:01,047 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-10 18:22:18,629 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 12 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 18:22:53,586 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-10 18:23:15,028 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10900, loss[loss=0.08839, beats_loss=0.01335, ecapa_loss=0.0002203, whisper_loss=0.07284, over 18948.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01185, ecapa_loss=0.0002312, whisper_loss=0.09513, over 3896662.27 frames. ], batch size: 78, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:23:34,897 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 18:24:02,056 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.256e+01 2.842e+01 3.313e+01 3.977e+01 6.808e+01, threshold=6.627e+01, percent-clipped=2.0 2024-08-10 18:24:09,845 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.95 vs. limit=15.0 2024-08-10 18:24:25,547 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.76 vs. limit=15.0 2024-08-10 18:24:28,421 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 18:24:36,710 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 10950, loss[loss=0.1237, beats_loss=0.00952, ecapa_loss=0.0002293, whisper_loss=0.1119, over 23372.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.0118, ecapa_loss=0.0002323, whisper_loss=0.09572, over 3932531.13 frames. ], batch size: 89, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:25:07,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=689420.0, ans=0.2 2024-08-10 18:25:26,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=689520.0, ans=0.0 2024-08-10 18:25:27,126 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 18:25:34,610 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 18:25:36,141 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-10 18:25:49,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=689620.0, ans=0.1 2024-08-10 18:25:50,469 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 18:25:55,027 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11000, loss[loss=0.1021, beats_loss=0.0113, ecapa_loss=0.0003303, whisper_loss=0.08754, over 20772.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01183, ecapa_loss=0.0002323, whisper_loss=0.09595, over 3963552.23 frames. ], batch size: 88, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:26:04,709 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-10 18:26:26,109 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=12.0 2024-08-10 18:26:37,293 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 18:26:41,080 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.276e+01 2.857e+01 3.230e+01 3.620e+01 6.298e+01, threshold=6.460e+01, percent-clipped=0.0 2024-08-10 18:26:56,250 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2024-08-10 18:27:08,104 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 18:27:16,924 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11050, loss[loss=0.1218, beats_loss=0.01127, ecapa_loss=0.0002245, whisper_loss=0.1082, over 21509.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01178, ecapa_loss=0.0002312, whisper_loss=0.09579, over 3919003.41 frames. ], batch size: 84, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:27:54,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=690420.0, ans=0.125 2024-08-10 18:27:55,723 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 18:28:03,964 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 18:28:07,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=690520.0, ans=0.0 2024-08-10 18:28:36,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=690720.0, ans=0.0 2024-08-10 18:28:36,871 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11100, loss[loss=0.1078, beats_loss=0.01127, ecapa_loss=0.0001862, whisper_loss=0.09469, over 17284.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01175, ecapa_loss=0.0002317, whisper_loss=0.09626, over 3934259.22 frames. ], batch size: 66, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:28:42,045 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 18:28:57,790 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 18:29:03,702 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.65 vs. limit=15.0 2024-08-10 18:29:07,551 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-10 18:29:08,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=690920.0, ans=0.04949747468305833 2024-08-10 18:29:16,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=690920.0, ans=0.0 2024-08-10 18:29:18,973 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.710e+01 3.196e+01 3.800e+01 5.125e+01, threshold=6.392e+01, percent-clipped=0.0 2024-08-10 18:29:29,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=691020.0, ans=0.125 2024-08-10 18:29:34,123 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.88 vs. limit=6.0 2024-08-10 18:29:36,871 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 18:29:47,721 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 39 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 18:29:54,923 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11150, loss[loss=0.1237, beats_loss=0.01187, ecapa_loss=0.0002383, whisper_loss=0.1094, over 22655.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01172, ecapa_loss=0.0002314, whisper_loss=0.09621, over 3906522.44 frames. ], batch size: 92, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:30:19,458 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 18:31:14,074 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11200, loss[loss=0.07881, beats_loss=0.0137, ecapa_loss=0.0002667, whisper_loss=0.06245, over 15089.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01169, ecapa_loss=0.0002331, whisper_loss=0.09568, over 3888644.36 frames. ], batch size: 65, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:31:18,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=691720.0, ans=0.125 2024-08-10 18:31:34,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=691820.0, ans=0.125 2024-08-10 18:31:40,915 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2024-08-10 18:31:45,238 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 29 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-10 18:31:56,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+01 2.779e+01 3.196e+01 3.588e+01 6.419e+01, threshold=6.392e+01, percent-clipped=1.0 2024-08-10 18:32:03,809 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 18:32:07,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=692020.0, ans=0.95 2024-08-10 18:32:21,892 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 11 from Vox, 41 fro AS 2024-08-10 18:32:31,836 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11250, loss[loss=0.1347, beats_loss=0.01059, ecapa_loss=0.0002524, whisper_loss=0.1216, over 22874.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01173, ecapa_loss=0.000233, whisper_loss=0.09631, over 3890949.07 frames. ], batch size: 89, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:32:37,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=692220.0, ans=0.0 2024-08-10 18:32:40,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=692220.0, ans=0.0 2024-08-10 18:33:03,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=692420.0, ans=0.0 2024-08-10 18:33:06,938 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.533e+05 2024-08-10 18:33:08,470 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-10 18:33:16,113 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 18:33:22,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=692520.0, ans=0.1 2024-08-10 18:33:24,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=692520.0, ans=0.0 2024-08-10 18:33:26,330 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 18:33:43,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=692620.0, ans=0.0 2024-08-10 18:33:51,088 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11300, loss[loss=0.1004, beats_loss=0.009582, ecapa_loss=0.0002472, whisper_loss=0.08836, over 16696.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01167, ecapa_loss=0.0002329, whisper_loss=0.09626, over 3879637.03 frames. ], batch size: 67, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:33:51,305 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 18:33:53,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=692720.0, ans=0.125 2024-08-10 18:34:00,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=692720.0, ans=0.0 2024-08-10 18:34:35,505 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.891e+01 3.346e+01 3.835e+01 5.621e+01, threshold=6.692e+01, percent-clipped=0.0 2024-08-10 18:34:37,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=693020.0, ans=0.1 2024-08-10 18:34:59,421 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2024-08-10 18:35:09,109 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11350, loss[loss=0.1233, beats_loss=0.008571, ecapa_loss=0.0002458, whisper_loss=0.1123, over 21975.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.0116, ecapa_loss=0.0002338, whisper_loss=0.09651, over 3861844.32 frames. ], batch size: 87, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:35:33,572 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 18:35:40,967 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-10 18:36:01,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=693520.0, ans=0.1 2024-08-10 18:36:11,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=693620.0, ans=0.125 2024-08-10 18:36:16,519 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-10 18:36:17,921 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-10 18:36:24,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11400, loss[loss=0.1029, beats_loss=0.01205, ecapa_loss=0.000233, whisper_loss=0.08853, over 21466.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01164, ecapa_loss=0.0002335, whisper_loss=0.09628, over 3854028.02 frames. ], batch size: 89, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:36:35,536 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 18:36:40,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=693820.0, ans=0.0 2024-08-10 18:36:45,038 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.37 vs. limit=15.0 2024-08-10 18:37:03,922 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.07 vs. limit=15.0 2024-08-10 18:37:07,044 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+01 2.890e+01 3.279e+01 3.857e+01 6.641e+01, threshold=6.557e+01, percent-clipped=0.0 2024-08-10 18:37:20,886 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 17 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 18:37:30,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=694120.0, ans=0.1 2024-08-10 18:37:32,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=694120.0, ans=0.1 2024-08-10 18:37:32,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=694120.0, ans=0.0 2024-08-10 18:37:34,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=694120.0, ans=0.0 2024-08-10 18:37:35,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=694120.0, ans=0.125 2024-08-10 18:37:39,652 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11450, loss[loss=0.1485, beats_loss=0.008665, ecapa_loss=0.0002618, whisper_loss=0.1372, over 14541.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01169, ecapa_loss=0.000232, whisper_loss=0.09604, over 3846050.90 frames. ], batch size: 56, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:37:40,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=694220.0, ans=0.125 2024-08-10 18:37:52,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=694220.0, ans=0.1 2024-08-10 18:37:57,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=694320.0, ans=0.125 2024-08-10 18:37:58,320 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 18:38:12,953 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-10 18:38:15,120 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2024-08-10 18:38:17,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=694420.0, ans=0.1 2024-08-10 18:38:18,128 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.46 vs. limit=15.0 2024-08-10 18:38:38,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=694520.0, ans=0.0 2024-08-10 18:38:53,533 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2024-08-10 18:38:57,185 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11500, loss[loss=0.1066, beats_loss=0.009615, ecapa_loss=0.0002713, whisper_loss=0.09429, over 19359.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01172, ecapa_loss=0.0002313, whisper_loss=0.0957, over 3854597.00 frames. ], batch size: 81, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:39:06,531 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 18:39:24,416 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2024-08-10 18:39:29,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=694920.0, ans=0.0 2024-08-10 18:39:37,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=694920.0, ans=0.125 2024-08-10 18:39:40,556 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.103e+01 2.741e+01 3.082e+01 3.618e+01 5.964e+01, threshold=6.164e+01, percent-clipped=0.0 2024-08-10 18:39:41,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=694920.0, ans=0.0 2024-08-10 18:39:50,486 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 18:40:13,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=695220.0, ans=0.125 2024-08-10 18:40:14,780 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11550, loss[loss=0.1142, beats_loss=0.01062, ecapa_loss=0.0002243, whisper_loss=0.1014, over 22623.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01169, ecapa_loss=0.000232, whisper_loss=0.09558, over 3850600.19 frames. ], batch size: 87, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:40:21,972 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.60 vs. limit=15.0 2024-08-10 18:40:58,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=695420.0, ans=0.125 2024-08-10 18:41:07,158 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.570e-02 2024-08-10 18:41:14,704 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 18:41:21,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=695620.0, ans=0.125 2024-08-10 18:41:33,423 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11600, loss[loss=0.09138, beats_loss=0.01197, ecapa_loss=0.0002476, whisper_loss=0.07693, over 20467.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01174, ecapa_loss=0.0002315, whisper_loss=0.095, over 3843126.17 frames. ], batch size: 84, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:41:44,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=695720.0, ans=0.125 2024-08-10 18:41:46,016 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 18:41:49,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=695820.0, ans=0.2 2024-08-10 18:42:16,259 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.151e+01 2.928e+01 3.314e+01 3.952e+01 8.355e+01, threshold=6.627e+01, percent-clipped=1.0 2024-08-10 18:42:19,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=696020.0, ans=0.125 2024-08-10 18:42:43,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=696120.0, ans=0.125 2024-08-10 18:42:49,831 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11650, loss[loss=0.102, beats_loss=0.01266, ecapa_loss=0.0002489, whisper_loss=0.08684, over 14307.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01169, ecapa_loss=0.0002313, whisper_loss=0.09443, over 3846201.09 frames. ], batch size: 59, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:43:09,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=696320.0, ans=0.125 2024-08-10 18:43:15,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=696320.0, ans=0.125 2024-08-10 18:43:35,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=696520.0, ans=0.125 2024-08-10 18:43:59,724 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11700, loss[loss=0.1258, beats_loss=0.01085, ecapa_loss=0.0002473, whisper_loss=0.1124, over 17980.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01166, ecapa_loss=0.0002334, whisper_loss=0.09474, over 3834307.90 frames. ], batch size: 72, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:44:05,278 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 32 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 18:44:29,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=696920.0, ans=0.025 2024-08-10 18:44:39,290 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.110e+01 2.923e+01 3.356e+01 3.959e+01 5.415e+01, threshold=6.712e+01, percent-clipped=0.0 2024-08-10 18:44:44,466 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 21 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-10 18:44:46,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=697020.0, ans=0.125 2024-08-10 18:44:55,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.32 vs. limit=22.5 2024-08-10 18:44:56,068 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 27 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 18:45:09,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=697220.0, ans=0.1 2024-08-10 18:45:10,063 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11750, loss[loss=0.118, beats_loss=0.008775, ecapa_loss=0.0003002, whisper_loss=0.1062, over 15786.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01175, ecapa_loss=0.0002333, whisper_loss=0.09462, over 3880250.60 frames. ], batch size: 65, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:45:30,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=697320.0, ans=0.0 2024-08-10 18:45:33,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=697320.0, ans=0.0 2024-08-10 18:45:53,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=697520.0, ans=0.125 2024-08-10 18:46:05,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=697620.0, ans=0.125 2024-08-10 18:46:07,509 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 31 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-10 18:46:10,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=697620.0, ans=0.125 2024-08-10 18:46:19,358 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11800, loss[loss=0.1291, beats_loss=0.01046, ecapa_loss=0.000205, whisper_loss=0.1166, over 16501.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01188, ecapa_loss=0.0002308, whisper_loss=0.09454, over 3894445.85 frames. ], batch size: 64, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:46:19,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=697720.0, ans=0.125 2024-08-10 18:46:21,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=697720.0, ans=0.125 2024-08-10 18:46:25,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=697720.0, ans=0.05 2024-08-10 18:46:39,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=697820.0, ans=0.0 2024-08-10 18:46:40,702 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-10 18:46:48,673 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.59 vs. limit=6.0 2024-08-10 18:46:49,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.82 vs. limit=15.0 2024-08-10 18:46:58,860 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.325e+01 3.039e+01 3.457e+01 3.903e+01 6.365e+01, threshold=6.915e+01, percent-clipped=0.0 2024-08-10 18:47:10,318 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.94 vs. limit=22.5 2024-08-10 18:47:10,960 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 18:47:23,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=698120.0, ans=0.125 2024-08-10 18:47:25,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=698120.0, ans=0.125 2024-08-10 18:47:30,754 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11850, loss[loss=0.1089, beats_loss=0.01347, ecapa_loss=0.0002212, whisper_loss=0.09318, over 21717.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01187, ecapa_loss=0.0002305, whisper_loss=0.09509, over 3945547.25 frames. ], batch size: 90, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:47:42,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=698220.0, ans=0.125 2024-08-10 18:47:48,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=698320.0, ans=0.125 2024-08-10 18:47:49,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=698320.0, ans=0.015 2024-08-10 18:47:58,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=698420.0, ans=0.125 2024-08-10 18:47:59,046 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 18:48:04,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=698420.0, ans=0.1 2024-08-10 18:48:23,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=698520.0, ans=0.0 2024-08-10 18:48:39,126 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11900, loss[loss=0.1141, beats_loss=0.01227, ecapa_loss=0.000257, whisper_loss=0.0993, over 23040.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01187, ecapa_loss=0.0002305, whisper_loss=0.09509, over 3938946.96 frames. ], batch size: 96, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:48:55,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=698820.0, ans=0.0 2024-08-10 18:49:00,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=698820.0, ans=0.0 2024-08-10 18:49:08,031 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-10 18:49:14,702 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-10 18:49:17,181 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.855e+01 3.159e+01 3.498e+01 6.204e+01, threshold=6.318e+01, percent-clipped=0.0 2024-08-10 18:49:24,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=699020.0, ans=0.2 2024-08-10 18:49:25,099 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2024-08-10 18:49:26,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=699020.0, ans=0.0 2024-08-10 18:49:30,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=699020.0, ans=0.125 2024-08-10 18:49:40,055 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 18:49:41,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=699120.0, ans=0.125 2024-08-10 18:49:46,924 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 11950, loss[loss=0.1016, beats_loss=0.01234, ecapa_loss=0.0002324, whisper_loss=0.08694, over 16946.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01171, ecapa_loss=0.0002339, whisper_loss=0.09543, over 3887269.37 frames. ], batch size: 68, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:50:13,517 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 18:50:13,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=699420.0, ans=0.0 2024-08-10 18:50:15,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=699420.0, ans=0.125 2024-08-10 18:50:24,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=699420.0, ans=0.125 2024-08-10 18:50:53,570 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12000, loss[loss=0.09963, beats_loss=0.01102, ecapa_loss=0.0002504, whisper_loss=0.0861, over 16018.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01179, ecapa_loss=0.0002348, whisper_loss=0.0947, over 3876275.94 frames. ], batch size: 64, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:50:53,571 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-10 18:51:35,512 INFO [train_multi_KD3.py:1149] (2/4) Epoch 5, validation on ASR_libri: loss=0.2622, beats_loss=0, ecapa_loss=0.0007279, whisper_loss=0.255, over 922467.00 frames. 2024-08-10 18:51:54,216 INFO [train_multi_KD3.py:1149] (2/4) Epoch 5, validation on SV_voxceleb1: loss=0.006203, beats_loss=0, ecapa_loss=0.0006203, whisper_loss=0, over 939242.00 frames. 2024-08-10 18:53:47,296 INFO [train_multi_KD3.py:1149] (2/4) Epoch 5, validation on AT_audioset: loss=0.02662, beats_loss=0.02662, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 18:53:47,301 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 18:53:54,093 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 11 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-10 18:54:05,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=699820.0, ans=0.125 2024-08-10 18:54:06,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=699820.0, ans=0.125 2024-08-10 18:54:25,077 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.732e+01 3.134e+01 3.531e+01 7.163e+01, threshold=6.268e+01, percent-clipped=1.0 2024-08-10 18:54:28,625 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.93 vs. limit=22.5 2024-08-10 18:54:31,889 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 18:54:40,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=700120.0, ans=0.0 2024-08-10 18:54:44,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=700120.0, ans=0.1 2024-08-10 18:54:48,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=700120.0, ans=0.125 2024-08-10 18:54:54,932 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12050, loss[loss=0.09732, beats_loss=0.01418, ecapa_loss=0.0002096, whisper_loss=0.08104, over 14566.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01187, ecapa_loss=0.000233, whisper_loss=0.0939, over 3842430.61 frames. ], batch size: 60, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:55:26,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=700420.0, ans=0.125 2024-08-10 18:55:27,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=700420.0, ans=0.2 2024-08-10 18:55:28,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=700420.0, ans=0.125 2024-08-10 18:55:36,321 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 18:55:39,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=700520.0, ans=0.125 2024-08-10 18:55:39,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=700520.0, ans=0.07 2024-08-10 18:55:49,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=700620.0, ans=0.125 2024-08-10 18:56:02,405 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12100, loss[loss=0.1223, beats_loss=0.009915, ecapa_loss=0.0002891, whisper_loss=0.1095, over 22457.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01185, ecapa_loss=0.0002347, whisper_loss=0.0946, over 3878745.05 frames. ], batch size: 90, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:56:08,049 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.91 vs. limit=22.5 2024-08-10 18:56:11,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=700720.0, ans=0.125 2024-08-10 18:56:14,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=700820.0, ans=0.125 2024-08-10 18:56:28,738 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.23 vs. limit=15.0 2024-08-10 18:56:32,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=700920.0, ans=0.0 2024-08-10 18:56:33,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=700920.0, ans=0.125 2024-08-10 18:56:40,113 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 2.774e+01 3.188e+01 3.789e+01 5.825e+01, threshold=6.376e+01, percent-clipped=0.0 2024-08-10 18:56:40,415 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 18:56:42,356 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=15.0 2024-08-10 18:56:48,540 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 18:56:49,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=12.0 2024-08-10 18:56:50,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=701020.0, ans=0.0 2024-08-10 18:56:53,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.56 vs. limit=22.5 2024-08-10 18:56:56,553 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-10 18:57:09,763 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12150, loss[loss=0.09452, beats_loss=0.01096, ecapa_loss=0.0002497, whisper_loss=0.08106, over 21629.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01174, ecapa_loss=0.0002358, whisper_loss=0.09507, over 3876209.00 frames. ], batch size: 86, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:57:24,367 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 18:57:29,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=701320.0, ans=0.1 2024-08-10 18:57:34,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=701320.0, ans=0.0 2024-08-10 18:57:34,995 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 18:57:44,235 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 18:58:17,821 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12200, loss[loss=0.124, beats_loss=0.01432, ecapa_loss=0.000171, whisper_loss=0.108, over 16941.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.0117, ecapa_loss=0.0002349, whisper_loss=0.09506, over 3852220.23 frames. ], batch size: 63, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:58:32,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=701820.0, ans=0.1 2024-08-10 18:58:38,519 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-10 18:58:39,763 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-10 18:58:40,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=701820.0, ans=0.5 2024-08-10 18:58:49,666 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 18:58:55,844 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.903e+01 3.177e+01 3.659e+01 7.236e+01, threshold=6.353e+01, percent-clipped=1.0 2024-08-10 18:58:56,042 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 18:59:20,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=702120.0, ans=0.1 2024-08-10 18:59:25,044 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12250, loss[loss=0.09581, beats_loss=0.01391, ecapa_loss=0.0002333, whisper_loss=0.07957, over 18910.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01164, ecapa_loss=0.0002362, whisper_loss=0.09536, over 3886260.96 frames. ], batch size: 78, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:59:30,752 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-10 18:59:34,551 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 18:59:43,788 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 18:59:45,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=702320.0, ans=0.0 2024-08-10 18:59:49,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=702320.0, ans=0.125 2024-08-10 18:59:53,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=702420.0, ans=0.125 2024-08-10 18:59:55,035 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 18:59:57,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=702420.0, ans=0.125 2024-08-10 19:00:32,593 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12300, loss[loss=0.1076, beats_loss=0.009687, ecapa_loss=0.0002349, whisper_loss=0.0956, over 17145.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01161, ecapa_loss=0.0002364, whisper_loss=0.09524, over 3891178.45 frames. ], batch size: 68, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:00:42,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=702720.0, ans=0.125 2024-08-10 19:00:43,778 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.95 vs. limit=22.5 2024-08-10 19:00:45,086 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2024-08-10 19:01:09,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=702920.0, ans=0.125 2024-08-10 19:01:10,058 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.851e+01 3.322e+01 3.771e+01 6.110e+01, threshold=6.644e+01, percent-clipped=0.0 2024-08-10 19:01:22,137 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 23 from LS+wenet, 21 from Vox, 16 fro AS 2024-08-10 19:01:39,860 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12350, loss[loss=0.08969, beats_loss=0.01358, ecapa_loss=0.0002548, whisper_loss=0.07356, over 12760.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01152, ecapa_loss=0.0002377, whisper_loss=0.09609, over 3887022.90 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:01:40,725 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.12 vs. limit=22.5 2024-08-10 19:01:48,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=703220.0, ans=0.125 2024-08-10 19:01:55,428 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 19:02:03,736 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 19:02:21,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=703420.0, ans=0.125 2024-08-10 19:02:33,804 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 19:02:37,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=703620.0, ans=0.125 2024-08-10 19:02:52,894 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12400, loss[loss=0.107, beats_loss=0.01387, ecapa_loss=0.0001943, whisper_loss=0.09115, over 21972.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01161, ecapa_loss=0.0002366, whisper_loss=0.09557, over 3888517.05 frames. ], batch size: 87, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:02:54,382 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 19:02:57,641 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2024-08-10 19:03:29,869 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-08-10 19:03:31,830 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.592e+01 3.077e+01 3.649e+01 6.276e+01, threshold=6.154e+01, percent-clipped=0.0 2024-08-10 19:03:32,887 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.10 vs. limit=10.0 2024-08-10 19:03:34,876 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 19:03:39,262 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 19:03:53,285 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 19:03:56,089 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 13 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-10 19:04:03,107 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12450, loss[loss=0.08729, beats_loss=0.0124, ecapa_loss=0.0002328, whisper_loss=0.07256, over 20084.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01166, ecapa_loss=0.000236, whisper_loss=0.09468, over 3861673.20 frames. ], batch size: 81, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:04:11,087 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 19:04:19,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=704320.0, ans=0.1 2024-08-10 19:04:26,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=704320.0, ans=0.1 2024-08-10 19:04:27,062 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 32 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-10 19:04:28,366 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-10 19:04:31,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=704420.0, ans=0.0 2024-08-10 19:04:45,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=704520.0, ans=0.07 2024-08-10 19:04:55,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=704520.0, ans=0.125 2024-08-10 19:04:56,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=704620.0, ans=0.125 2024-08-10 19:05:02,464 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 22 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-10 19:05:08,362 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 19:05:12,226 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12500, loss[loss=0.09087, beats_loss=0.01496, ecapa_loss=0.0001758, whisper_loss=0.07415, over 22004.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01176, ecapa_loss=0.0002348, whisper_loss=0.09432, over 3877416.92 frames. ], batch size: 88, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:05:31,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.28 vs. limit=10.0 2024-08-10 19:05:44,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=704920.0, ans=0.0 2024-08-10 19:05:51,408 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+01 2.864e+01 3.204e+01 3.870e+01 6.784e+01, threshold=6.407e+01, percent-clipped=3.0 2024-08-10 19:06:07,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=705120.0, ans=0.125 2024-08-10 19:06:17,942 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-10 19:06:19,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=705120.0, ans=0.1 2024-08-10 19:06:21,556 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12550, loss[loss=0.128, beats_loss=0.01179, ecapa_loss=0.0002203, whisper_loss=0.114, over 22937.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01173, ecapa_loss=0.0002345, whisper_loss=0.09437, over 3861193.99 frames. ], batch size: 91, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:06:26,013 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-10 19:06:38,147 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 33 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 19:06:51,298 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-08-10 19:06:53,541 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-10 19:06:56,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=705420.0, ans=0.0 2024-08-10 19:07:34,889 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12600, loss[loss=0.1242, beats_loss=0.008431, ecapa_loss=0.000307, whisper_loss=0.1127, over 17507.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01177, ecapa_loss=0.0002336, whisper_loss=0.09462, over 3888554.84 frames. ], batch size: 70, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:07:38,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=705720.0, ans=0.0 2024-08-10 19:07:40,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=705720.0, ans=0.0 2024-08-10 19:07:43,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=705720.0, ans=0.125 2024-08-10 19:08:12,476 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 2.787e+01 3.074e+01 3.484e+01 6.689e+01, threshold=6.148e+01, percent-clipped=1.0 2024-08-10 19:08:16,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=706020.0, ans=0.125 2024-08-10 19:08:30,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=706120.0, ans=0.1 2024-08-10 19:08:30,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=706120.0, ans=0.1 2024-08-10 19:08:36,483 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2024-08-10 19:08:38,493 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 19:08:41,032 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 19:08:42,084 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12650, loss[loss=0.1157, beats_loss=0.01034, ecapa_loss=0.0002107, whisper_loss=0.1033, over 14733.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01179, ecapa_loss=0.0002327, whisper_loss=0.09455, over 3830460.13 frames. ], batch size: 57, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:08:50,520 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 19:08:56,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=706320.0, ans=0.0 2024-08-10 19:09:10,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=706420.0, ans=0.1 2024-08-10 19:09:14,092 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.23 vs. limit=15.0 2024-08-10 19:09:17,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=706420.0, ans=0.5 2024-08-10 19:09:27,775 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 19:09:31,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=706520.0, ans=0.09899494936611666 2024-08-10 19:09:41,769 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-10 19:09:49,115 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12700, loss[loss=0.1219, beats_loss=0.01037, ecapa_loss=0.0002624, whisper_loss=0.1089, over 22057.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01172, ecapa_loss=0.0002333, whisper_loss=0.09538, over 3837103.97 frames. ], batch size: 94, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:09:50,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=706720.0, ans=0.0 2024-08-10 19:09:52,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=706720.0, ans=0.2 2024-08-10 19:10:12,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=706820.0, ans=0.125 2024-08-10 19:10:15,263 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 11 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 19:10:26,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=706920.0, ans=0.125 2024-08-10 19:10:27,286 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.805e+01 3.118e+01 3.753e+01 7.808e+01, threshold=6.236e+01, percent-clipped=1.0 2024-08-10 19:10:31,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=707020.0, ans=0.125 2024-08-10 19:10:38,234 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 19:10:45,537 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.51 vs. limit=15.0 2024-08-10 19:10:57,198 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12750, loss[loss=0.0917, beats_loss=0.01396, ecapa_loss=0.0002006, whisper_loss=0.07574, over 17106.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01183, ecapa_loss=0.0002329, whisper_loss=0.09447, over 3851473.60 frames. ], batch size: 69, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:11:11,053 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 39 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 19:11:26,706 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-10 19:11:30,954 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=15.0 2024-08-10 19:11:33,248 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-10 19:11:49,548 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-10 19:12:04,639 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12800, loss[loss=0.1092, beats_loss=0.01215, ecapa_loss=0.0002211, whisper_loss=0.09482, over 23133.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01177, ecapa_loss=0.0002346, whisper_loss=0.09512, over 3864145.91 frames. ], batch size: 93, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:12:05,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=707720.0, ans=0.0 2024-08-10 19:12:26,373 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 19:12:38,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=707920.0, ans=0.125 2024-08-10 19:12:39,971 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-10 19:12:41,919 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.755e+01 3.116e+01 3.558e+01 5.514e+01, threshold=6.233e+01, percent-clipped=0.0 2024-08-10 19:12:44,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=708020.0, ans=0.1 2024-08-10 19:12:44,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=708020.0, ans=0.125 2024-08-10 19:12:46,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=708020.0, ans=0.95 2024-08-10 19:13:05,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=708120.0, ans=0.0 2024-08-10 19:13:11,194 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12850, loss[loss=0.08545, beats_loss=0.01357, ecapa_loss=0.0002478, whisper_loss=0.0694, over 21374.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01187, ecapa_loss=0.0002341, whisper_loss=0.09419, over 3886534.13 frames. ], batch size: 92, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:13:11,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=708220.0, ans=0.0 2024-08-10 19:13:38,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=708420.0, ans=0.1 2024-08-10 19:13:45,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=708420.0, ans=0.2 2024-08-10 19:14:00,388 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 19:14:11,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=708620.0, ans=0.2 2024-08-10 19:14:11,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=708620.0, ans=15.0 2024-08-10 19:14:18,665 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12900, loss[loss=0.1077, beats_loss=0.01013, ecapa_loss=0.000298, whisper_loss=0.09455, over 21465.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01181, ecapa_loss=0.0002343, whisper_loss=0.09388, over 3858612.27 frames. ], batch size: 90, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:14:21,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=708720.0, ans=0.125 2024-08-10 19:14:23,807 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.46 vs. limit=10.0 2024-08-10 19:14:26,260 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.11 vs. limit=22.5 2024-08-10 19:14:38,206 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-08-10 19:14:43,017 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 19:14:49,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=708920.0, ans=0.0 2024-08-10 19:14:50,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=708920.0, ans=0.125 2024-08-10 19:14:54,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=708920.0, ans=0.125 2024-08-10 19:14:55,260 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.870e+01 3.277e+01 3.550e+01 6.009e+01, threshold=6.554e+01, percent-clipped=0.0 2024-08-10 19:15:00,566 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 33 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-10 19:15:08,793 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 19:15:11,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=709120.0, ans=0.0 2024-08-10 19:15:15,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=709120.0, ans=0.0 2024-08-10 19:15:17,882 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 19:15:24,328 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 12950, loss[loss=0.09303, beats_loss=0.0144, ecapa_loss=0.0002179, whisper_loss=0.07645, over 20055.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01171, ecapa_loss=0.0002351, whisper_loss=0.09392, over 3814820.55 frames. ], batch size: 84, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:15:28,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=709220.0, ans=0.125 2024-08-10 19:15:37,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.47 vs. limit=10.0 2024-08-10 19:15:46,125 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.94 vs. limit=22.5 2024-08-10 19:15:54,529 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 11 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 19:16:30,107 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13000, loss[loss=0.09515, beats_loss=0.01361, ecapa_loss=0.000246, whisper_loss=0.07907, over 23385.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01171, ecapa_loss=0.000233, whisper_loss=0.09491, over 3849238.05 frames. ], batch size: 94, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:16:55,491 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-08-10 19:17:06,971 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.263e+01 2.942e+01 3.329e+01 3.753e+01 5.609e+01, threshold=6.657e+01, percent-clipped=0.0 2024-08-10 19:17:10,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=710020.0, ans=0.1 2024-08-10 19:17:24,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=710120.0, ans=0.1 2024-08-10 19:17:34,703 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 19:17:35,824 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13050, loss[loss=0.09409, beats_loss=0.01173, ecapa_loss=0.0001866, whisper_loss=0.08049, over 13859.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01177, ecapa_loss=0.0002335, whisper_loss=0.09469, over 3828443.22 frames. ], batch size: 54, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:17:54,431 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 19:17:59,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=710320.0, ans=0.125 2024-08-10 19:18:00,702 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 19:18:02,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=710420.0, ans=0.95 2024-08-10 19:18:12,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=710420.0, ans=0.07 2024-08-10 19:18:22,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=710520.0, ans=0.2 2024-08-10 19:18:26,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=710520.0, ans=0.0 2024-08-10 19:18:33,089 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 19:18:38,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=710620.0, ans=0.125 2024-08-10 19:18:42,108 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13100, loss[loss=0.09974, beats_loss=0.01488, ecapa_loss=0.000299, whisper_loss=0.08187, over 21240.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01185, ecapa_loss=0.0002323, whisper_loss=0.09415, over 3851001.02 frames. ], batch size: 91, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:18:46,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=710720.0, ans=0.125 2024-08-10 19:18:49,249 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-10 19:18:49,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=710720.0, ans=0.125 2024-08-10 19:19:03,987 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 19:19:06,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=710820.0, ans=0.1 2024-08-10 19:19:09,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=710920.0, ans=0.2 2024-08-10 19:19:17,060 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 19:19:19,388 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+01 2.880e+01 3.300e+01 3.880e+01 5.965e+01, threshold=6.600e+01, percent-clipped=0.0 2024-08-10 19:19:21,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=711020.0, ans=0.2 2024-08-10 19:19:24,286 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.55 vs. limit=15.0 2024-08-10 19:19:34,429 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-10 19:19:48,773 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13150, loss[loss=0.1165, beats_loss=0.01247, ecapa_loss=0.0001913, whisper_loss=0.1021, over 20752.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01176, ecapa_loss=0.0002327, whisper_loss=0.09432, over 3848974.86 frames. ], batch size: 78, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:19:53,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=711220.0, ans=0.125 2024-08-10 19:20:02,998 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2024-08-10 19:20:18,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=711420.0, ans=0.0 2024-08-10 19:20:34,597 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 10 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 19:20:38,616 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=15.0 2024-08-10 19:20:41,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=711520.0, ans=0.2 2024-08-10 19:20:42,404 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 24 from Vox, 16 fro AS 2024-08-10 19:20:44,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=711620.0, ans=0.2 2024-08-10 19:20:58,678 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-08-10 19:20:59,024 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13200, loss[loss=0.1159, beats_loss=0.01195, ecapa_loss=0.0002422, whisper_loss=0.1015, over 22017.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01182, ecapa_loss=0.0002321, whisper_loss=0.09446, over 3876063.77 frames. ], batch size: 88, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:21:00,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=711720.0, ans=0.0 2024-08-10 19:21:03,007 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 17 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-10 19:21:03,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=711720.0, ans=0.0 2024-08-10 19:21:31,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=711920.0, ans=0.0 2024-08-10 19:21:37,362 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+01 3.006e+01 3.463e+01 3.966e+01 7.207e+01, threshold=6.927e+01, percent-clipped=1.0 2024-08-10 19:21:39,383 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.70 vs. limit=22.5 2024-08-10 19:21:41,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=712020.0, ans=0.125 2024-08-10 19:22:03,586 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-10 19:22:05,815 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13250, loss[loss=0.1068, beats_loss=0.01068, ecapa_loss=0.000284, whisper_loss=0.09331, over 21281.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01177, ecapa_loss=0.0002342, whisper_loss=0.09481, over 3863733.07 frames. ], batch size: 92, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:22:07,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=712220.0, ans=0.0 2024-08-10 19:22:13,548 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-10 19:22:23,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=712320.0, ans=0.5 2024-08-10 19:22:25,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=712320.0, ans=0.5 2024-08-10 19:22:37,002 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-10 19:22:44,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=712520.0, ans=0.125 2024-08-10 19:22:49,681 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 19:22:53,855 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 19:23:00,410 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-10 19:23:05,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=712620.0, ans=0.1 2024-08-10 19:23:05,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=712620.0, ans=0.125 2024-08-10 19:23:11,666 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13300, loss[loss=0.126, beats_loss=0.01045, ecapa_loss=0.0002115, whisper_loss=0.1135, over 23042.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01174, ecapa_loss=0.0002339, whisper_loss=0.09482, over 3892353.73 frames. ], batch size: 89, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:23:18,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=712720.0, ans=0.125 2024-08-10 19:23:24,150 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 19:23:29,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=712820.0, ans=0.2 2024-08-10 19:23:29,784 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.23 vs. limit=15.0 2024-08-10 19:23:45,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=712920.0, ans=0.0 2024-08-10 19:23:46,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=712920.0, ans=0.0 2024-08-10 19:23:47,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=712920.0, ans=15.0 2024-08-10 19:23:48,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=712920.0, ans=0.1 2024-08-10 19:23:48,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=712920.0, ans=0.2 2024-08-10 19:23:50,923 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.791e+01 3.143e+01 3.422e+01 5.648e+01, threshold=6.287e+01, percent-clipped=0.0 2024-08-10 19:23:56,824 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2024-08-10 19:24:00,549 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 19:24:03,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=713020.0, ans=0.0 2024-08-10 19:24:06,951 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 19:24:19,927 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13350, loss[loss=0.1279, beats_loss=0.008919, ecapa_loss=0.0002917, whisper_loss=0.1161, over 22348.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01172, ecapa_loss=0.0002332, whisper_loss=0.09536, over 3859310.40 frames. ], batch size: 94, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:24:20,060 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-10 19:24:36,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=713320.0, ans=0.125 2024-08-10 19:24:39,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=713320.0, ans=0.125 2024-08-10 19:24:51,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=713420.0, ans=0.125 2024-08-10 19:24:55,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=713420.0, ans=0.0 2024-08-10 19:25:17,257 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2024-08-10 19:25:21,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=713620.0, ans=0.2 2024-08-10 19:25:25,235 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.76 vs. limit=10.0 2024-08-10 19:25:26,902 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13400, loss[loss=0.1081, beats_loss=0.01304, ecapa_loss=0.0002365, whisper_loss=0.0927, over 14085.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01177, ecapa_loss=0.0002318, whisper_loss=0.09516, over 3834926.96 frames. ], batch size: 57, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:25:43,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=713820.0, ans=0.125 2024-08-10 19:25:51,370 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 26 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-10 19:25:54,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=713920.0, ans=0.125 2024-08-10 19:26:05,705 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.716e+01 3.114e+01 3.677e+01 5.856e+01, threshold=6.229e+01, percent-clipped=0.0 2024-08-10 19:26:08,201 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-10 19:26:08,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=714020.0, ans=0.125 2024-08-10 19:26:09,791 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-10 19:26:18,055 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 19:26:38,532 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13450, loss[loss=0.1063, beats_loss=0.01138, ecapa_loss=0.0002695, whisper_loss=0.09222, over 16763.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01185, ecapa_loss=0.0002324, whisper_loss=0.09411, over 3865351.12 frames. ], batch size: 69, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:27:03,847 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.724e-01 2024-08-10 19:27:04,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=714320.0, ans=0.125 2024-08-10 19:27:06,344 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-10 19:27:13,564 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 19:27:14,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=714420.0, ans=0.05 2024-08-10 19:27:14,347 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2024-08-10 19:27:31,186 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 19:27:36,019 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 19:27:40,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=714520.0, ans=0.0 2024-08-10 19:27:41,788 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 19:27:45,898 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 19:27:48,151 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-10 19:27:49,824 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 19:27:54,109 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 19:28:18,085 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13500, loss[loss=0.1198, beats_loss=0.01214, ecapa_loss=0.0002075, whisper_loss=0.1055, over 21424.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01176, ecapa_loss=0.0002319, whisper_loss=0.09517, over 3849685.01 frames. ], batch size: 85, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:28:29,818 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 19:29:06,285 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 19:29:11,310 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 2.843e+01 3.302e+01 3.860e+01 1.367e+02, threshold=6.604e+01, percent-clipped=1.0 2024-08-10 19:29:16,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=715020.0, ans=0.2 2024-08-10 19:29:29,625 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2024-08-10 19:29:43,690 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13550, loss[loss=0.09663, beats_loss=0.009631, ecapa_loss=0.0003255, whisper_loss=0.08374, over 14386.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01177, ecapa_loss=0.0002321, whisper_loss=0.09467, over 3832070.97 frames. ], batch size: 58, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:29:49,323 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 19:29:56,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=715320.0, ans=0.2 2024-08-10 19:30:02,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=715320.0, ans=0.125 2024-08-10 19:30:04,039 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.74 vs. limit=15.0 2024-08-10 19:30:05,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=715320.0, ans=0.125 2024-08-10 19:30:16,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=715420.0, ans=0.125 2024-08-10 19:30:19,356 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 20 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-10 19:30:25,657 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 19:30:56,089 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13600, loss[loss=0.1037, beats_loss=0.009305, ecapa_loss=0.000302, whisper_loss=0.09137, over 18854.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01183, ecapa_loss=0.0002308, whisper_loss=0.09401, over 3830407.36 frames. ], batch size: 81, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:31:25,410 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 19:31:40,549 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 3.018e+01 3.345e+01 4.176e+01 9.829e+01, threshold=6.690e+01, percent-clipped=2.0 2024-08-10 19:31:45,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=716020.0, ans=0.125 2024-08-10 19:31:52,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=716020.0, ans=0.0 2024-08-10 19:31:56,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=716120.0, ans=0.125 2024-08-10 19:32:11,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=716220.0, ans=0.125 2024-08-10 19:32:13,162 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13650, loss[loss=0.132, beats_loss=0.01124, ecapa_loss=0.0002423, whisper_loss=0.1183, over 23525.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01179, ecapa_loss=0.0002314, whisper_loss=0.09471, over 3820792.71 frames. ], batch size: 92, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:32:30,372 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-10 19:32:39,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=716320.0, ans=0.0 2024-08-10 19:32:47,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=716420.0, ans=0.125 2024-08-10 19:32:58,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=716520.0, ans=0.125 2024-08-10 19:32:59,333 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 19:33:06,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=716520.0, ans=0.125 2024-08-10 19:33:13,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=716620.0, ans=0.125 2024-08-10 19:33:29,298 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13700, loss[loss=0.09077, beats_loss=0.01315, ecapa_loss=0.000218, whisper_loss=0.07544, over 22080.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01178, ecapa_loss=0.0002345, whisper_loss=0.09425, over 3812057.06 frames. ], batch size: 87, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:33:48,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=716820.0, ans=0.1 2024-08-10 19:33:52,410 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 19:34:06,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=716920.0, ans=0.125 2024-08-10 19:34:12,866 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.265e+01 2.823e+01 3.317e+01 3.890e+01 6.067e+01, threshold=6.634e+01, percent-clipped=0.0 2024-08-10 19:34:28,255 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.09 vs. limit=22.5 2024-08-10 19:34:40,901 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 19:34:46,783 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13750, loss[loss=0.1133, beats_loss=0.01091, ecapa_loss=0.0002679, whisper_loss=0.09967, over 22215.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.0117, ecapa_loss=0.0002356, whisper_loss=0.09455, over 3812881.20 frames. ], batch size: 90, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:34:46,896 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 19:34:50,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=717220.0, ans=0.1 2024-08-10 19:35:28,658 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.12 vs. limit=22.5 2024-08-10 19:35:34,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=717520.0, ans=0.125 2024-08-10 19:35:49,945 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 19:36:02,426 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13800, loss[loss=0.077, beats_loss=0.01625, ecapa_loss=0.0002086, whisper_loss=0.05866, over 16513.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.0117, ecapa_loss=0.000235, whisper_loss=0.09421, over 3849970.55 frames. ], batch size: 67, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:36:06,014 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 28 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 19:36:16,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=717820.0, ans=0.0 2024-08-10 19:36:19,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=717820.0, ans=0.125 2024-08-10 19:36:25,979 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.333e-02 2024-08-10 19:36:46,388 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.322e+01 2.754e+01 3.224e+01 3.629e+01 6.153e+01, threshold=6.448e+01, percent-clipped=0.0 2024-08-10 19:36:48,461 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 19:36:49,553 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 31 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 19:37:06,081 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 19:37:13,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=718120.0, ans=0.125 2024-08-10 19:37:21,049 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13850, loss[loss=0.1223, beats_loss=0.01052, ecapa_loss=0.0002298, whisper_loss=0.1095, over 17329.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01167, ecapa_loss=0.0002349, whisper_loss=0.09523, over 3869276.71 frames. ], batch size: 69, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:37:21,817 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=15.0 2024-08-10 19:37:29,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=718220.0, ans=0.0 2024-08-10 19:37:31,010 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=9.084e-01 2024-08-10 19:37:35,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=718320.0, ans=0.125 2024-08-10 19:37:51,617 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 19:37:58,277 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 19:38:04,828 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 19:38:08,550 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 19:38:17,964 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 19:38:36,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=718620.0, ans=0.0 2024-08-10 19:38:39,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=718620.0, ans=0.0 2024-08-10 19:38:41,129 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13900, loss[loss=0.1146, beats_loss=0.008638, ecapa_loss=0.0002588, whisper_loss=0.1034, over 19902.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01161, ecapa_loss=0.0002348, whisper_loss=0.09544, over 3868132.16 frames. ], batch size: 78, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:38:58,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=718820.0, ans=0.0 2024-08-10 19:39:11,298 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 19:39:24,667 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 2.959e+01 3.276e+01 3.717e+01 7.288e+01, threshold=6.551e+01, percent-clipped=2.0 2024-08-10 19:39:57,714 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 13950, loss[loss=0.101, beats_loss=0.01087, ecapa_loss=0.0002303, whisper_loss=0.08779, over 17004.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01169, ecapa_loss=0.000234, whisper_loss=0.09476, over 3871757.06 frames. ], batch size: 63, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:40:05,992 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2024-08-10 19:40:12,068 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 19:40:15,486 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-10 19:40:32,513 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-10 19:40:42,937 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.260e-01 2024-08-10 19:40:53,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=719520.0, ans=0.0 2024-08-10 19:40:57,425 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.17 vs. limit=10.0 2024-08-10 19:41:13,741 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 14000, loss[loss=0.1207, beats_loss=0.008853, ecapa_loss=0.0002446, whisper_loss=0.1094, over 18971.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01166, ecapa_loss=0.0002311, whisper_loss=0.09499, over 3879692.94 frames. ], batch size: 74, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:41:50,452 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.033e-02 2024-08-10 19:41:59,737 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 2.816e+01 3.391e+01 3.815e+01 6.287e+01, threshold=6.783e+01, percent-clipped=0.0 2024-08-10 19:42:00,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=719920.0, ans=0.125 2024-08-10 19:42:07,009 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-10 19:42:18,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=720120.0, ans=0.0 2024-08-10 19:42:26,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=720120.0, ans=0.1 2024-08-10 19:42:31,408 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 19:42:32,901 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 19:42:34,380 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 14050, loss[loss=0.108, beats_loss=0.01025, ecapa_loss=0.0002807, whisper_loss=0.09498, over 15305.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01176, ecapa_loss=0.0002308, whisper_loss=0.09425, over 3868466.47 frames. ], batch size: 62, lr: 1.18e-02, grad_scale: 2199023255552.0 2024-08-10 19:42:57,769 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.70 vs. limit=10.0 2024-08-10 19:43:51,613 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 14100, loss[loss=0.09852, beats_loss=0.01258, ecapa_loss=0.0002996, whisper_loss=0.08294, over 21240.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01173, ecapa_loss=0.0002294, whisper_loss=0.09517, over 3910841.38 frames. ], batch size: 89, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:43:57,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=720720.0, ans=0.125 2024-08-10 19:44:00,388 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 19:44:12,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=720820.0, ans=0.0 2024-08-10 19:44:15,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=720820.0, ans=0.125 2024-08-10 19:44:19,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=720920.0, ans=0.125 2024-08-10 19:44:21,849 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=12.0 2024-08-10 19:44:28,451 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 19:44:32,641 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.245e+01 2.752e+01 3.141e+01 3.762e+01 7.016e+01, threshold=6.282e+01, percent-clipped=2.0 2024-08-10 19:44:39,895 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-10 19:44:40,623 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.59 vs. limit=22.5 2024-08-10 19:44:52,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=721120.0, ans=0.125 2024-08-10 19:45:00,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=721120.0, ans=0.0 2024-08-10 19:45:06,084 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 14150, loss[loss=0.112, beats_loss=0.01072, ecapa_loss=0.0003117, whisper_loss=0.09816, over 14841.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01183, ecapa_loss=0.0002297, whisper_loss=0.09474, over 3869326.75 frames. ], batch size: 63, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:45:27,612 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 14 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 19:45:27,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=721320.0, ans=0.125 2024-08-10 19:46:06,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=721620.0, ans=0.025 2024-08-10 19:46:07,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=721620.0, ans=0.0 2024-08-10 19:46:17,849 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.98 vs. limit=15.0 2024-08-10 19:46:21,281 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 14200, loss[loss=0.1146, beats_loss=0.01047, ecapa_loss=0.0002261, whisper_loss=0.1019, over 22446.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01182, ecapa_loss=0.0002271, whisper_loss=0.09444, over 3875208.76 frames. ], batch size: 91, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:46:29,509 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 19:46:31,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=721720.0, ans=0.2 2024-08-10 19:46:36,776 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 19:46:54,661 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-08-10 19:47:04,059 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.464e+01 2.830e+01 3.191e+01 3.752e+01 5.497e+01, threshold=6.381e+01, percent-clipped=0.0 2024-08-10 19:47:13,352 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2024-08-10 19:47:16,293 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.32 vs. limit=15.0 2024-08-10 19:47:33,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=722120.0, ans=0.05 2024-08-10 19:47:38,102 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 14250, loss[loss=0.1272, beats_loss=0.01087, ecapa_loss=0.0002001, whisper_loss=0.1143, over 15436.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01177, ecapa_loss=0.0002277, whisper_loss=0.09492, over 3916004.38 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:47:59,953 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 19:48:07,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=722320.0, ans=0.0 2024-08-10 19:48:18,308 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.83 vs. limit=15.0 2024-08-10 19:48:20,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722420.0, ans=0.1 2024-08-10 19:48:27,998 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 19:48:31,601 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2024-08-10 19:48:56,923 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 14300, loss[loss=0.1074, beats_loss=0.01068, ecapa_loss=0.0002621, whisper_loss=0.09408, over 20749.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01171, ecapa_loss=0.0002284, whisper_loss=0.09527, over 3934042.47 frames. ], batch size: 84, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:49:40,844 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.839e+01 3.149e+01 3.823e+01 7.710e+01, threshold=6.298e+01, percent-clipped=1.0 2024-08-10 19:50:05,497 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 29 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-10 19:50:13,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=723220.0, ans=0.0 2024-08-10 19:50:15,320 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 14350, loss[loss=0.1117, beats_loss=0.01057, ecapa_loss=0.0002249, whisper_loss=0.09889, over 16938.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01173, ecapa_loss=0.0002282, whisper_loss=0.095, over 3961125.75 frames. ], batch size: 67, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:50:15,443 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 31 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 19:50:34,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=723320.0, ans=0.125 2024-08-10 19:50:41,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=723320.0, ans=0.0 2024-08-10 19:50:44,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=723420.0, ans=10.0 2024-08-10 19:50:45,611 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 21 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-10 19:50:50,403 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.22 vs. limit=15.0 2024-08-10 19:51:06,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=723520.0, ans=0.125 2024-08-10 19:51:21,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=723620.0, ans=0.2 2024-08-10 19:51:24,561 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-10 19:51:30,392 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 14400, loss[loss=0.08719, beats_loss=0.01479, ecapa_loss=0.0001885, whisper_loss=0.07052, over 21201.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01175, ecapa_loss=0.0002284, whisper_loss=0.0949, over 3956000.03 frames. ], batch size: 86, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:52:10,851 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2024-08-10 19:52:11,391 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.768e+01 3.038e+01 3.446e+01 5.868e+01, threshold=6.077e+01, percent-clipped=0.0 2024-08-10 19:52:29,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=724120.0, ans=0.0 2024-08-10 19:52:37,843 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 19:52:47,406 INFO [train_multi_KD3.py:1116] (2/4) Epoch 5, batch 14450, loss[loss=0.1316, beats_loss=0.009596, ecapa_loss=0.0002582, whisper_loss=0.1195, over 22778.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01167, ecapa_loss=0.0002304, whisper_loss=0.09482, over 3911960.95 frames. ], batch size: 90, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:52:50,883 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 35 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 19:53:02,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=724320.0, ans=0.125 2024-08-10 19:53:22,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=724420.0, ans=0.125 2024-08-10 19:53:22,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=724420.0, ans=0.2 2024-08-10 19:53:44,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=724520.0, ans=0.0 2024-08-10 19:53:45,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=724520.0, ans=0.125 2024-08-10 19:54:34,071 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 0, loss[loss=0.115, beats_loss=0.01034, ecapa_loss=0.0002577, whisper_loss=0.1021, over 15174.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01034, ecapa_loss=0.0002577, whisper_loss=0.1021, over 15174.00 frames. ], batch size: 62, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 19:54:34,072 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-10 19:55:10,672 INFO [train_multi_KD3.py:1149] (2/4) Epoch 6, validation on ASR_libri: loss=0.2614, beats_loss=0, ecapa_loss=0.0007237, whisper_loss=0.2541, over 922467.00 frames. 2024-08-10 19:55:26,899 INFO [train_multi_KD3.py:1149] (2/4) Epoch 6, validation on SV_voxceleb1: loss=0.006205, beats_loss=0, ecapa_loss=0.0006205, whisper_loss=0, over 939242.00 frames. 2024-08-10 19:57:12,806 INFO [train_multi_KD3.py:1149] (2/4) Epoch 6, validation on AT_audioset: loss=0.02628, beats_loss=0.02628, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 19:57:12,810 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 19:57:12,988 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-10 19:57:58,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=724750.0, ans=0.0 2024-08-10 19:58:02,656 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 19:58:22,725 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-10 19:58:25,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=724850.0, ans=0.1 2024-08-10 19:58:40,590 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+01 3.034e+01 3.419e+01 4.003e+01 7.099e+01, threshold=6.838e+01, percent-clipped=1.0 2024-08-10 19:59:15,295 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 50, loss[loss=0.1172, beats_loss=0.008861, ecapa_loss=0.0002759, whisper_loss=0.1056, over 17122.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01131, ecapa_loss=0.0002379, whisper_loss=0.09395, over 871064.42 frames. ], batch size: 65, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 19:59:16,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=725150.0, ans=0.1 2024-08-10 19:59:59,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=725250.0, ans=0.125 2024-08-10 19:59:59,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=725250.0, ans=0.125 2024-08-10 20:00:06,062 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 20:00:14,882 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.54 vs. limit=10.0 2024-08-10 20:00:21,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=725350.0, ans=0.125 2024-08-10 20:00:21,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=725350.0, ans=0.02 2024-08-10 20:00:21,836 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.24 vs. limit=15.0 2024-08-10 20:00:25,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=725450.0, ans=0.125 2024-08-10 20:00:27,652 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-10 20:00:30,518 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=12.0 2024-08-10 20:01:01,063 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-10 20:01:09,129 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 100, loss[loss=0.1099, beats_loss=0.01036, ecapa_loss=0.0002317, whisper_loss=0.09719, over 18106.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01113, ecapa_loss=0.0002348, whisper_loss=0.09299, over 1517763.94 frames. ], batch size: 72, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:01:27,515 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 20:02:13,261 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.76 vs. limit=12.0 2024-08-10 20:02:28,024 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.454e+01 2.882e+01 3.222e+01 3.754e+01 5.300e+01, threshold=6.444e+01, percent-clipped=0.0 2024-08-10 20:02:42,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=726050.0, ans=0.2 2024-08-10 20:02:43,598 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-10 20:02:58,313 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 150, loss[loss=0.1234, beats_loss=0.01148, ecapa_loss=0.0002135, whisper_loss=0.1098, over 16285.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01121, ecapa_loss=0.000228, whisper_loss=0.09349, over 2013199.04 frames. ], batch size: 62, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:03:07,272 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.01 vs. limit=15.0 2024-08-10 20:03:51,338 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 20:03:52,569 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 29 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-10 20:03:54,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=726450.0, ans=0.0 2024-08-10 20:03:59,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=726450.0, ans=0.1 2024-08-10 20:04:06,879 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2024-08-10 20:04:13,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=726550.0, ans=0.125 2024-08-10 20:04:17,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=726550.0, ans=0.2 2024-08-10 20:04:17,834 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-10 20:04:22,745 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 200, loss[loss=0.1119, beats_loss=0.01315, ecapa_loss=0.0002466, whisper_loss=0.09633, over 22110.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01135, ecapa_loss=0.0002264, whisper_loss=0.09343, over 2418502.28 frames. ], batch size: 90, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:04:22,857 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 20:04:30,830 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.48 vs. limit=15.0 2024-08-10 20:04:42,267 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 20:04:51,620 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.55 vs. limit=10.0 2024-08-10 20:05:08,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=726850.0, ans=0.1 2024-08-10 20:05:14,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=726950.0, ans=0.0 2024-08-10 20:05:15,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=726950.0, ans=0.125 2024-08-10 20:05:19,554 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.639e+01 2.951e+01 3.334e+01 6.571e+01, threshold=5.903e+01, percent-clipped=1.0 2024-08-10 20:05:19,765 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 20:05:25,707 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 20:05:31,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=727050.0, ans=0.0 2024-08-10 20:05:39,533 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.37 vs. limit=22.5 2024-08-10 20:05:40,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=727150.0, ans=0.0 2024-08-10 20:05:41,612 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 250, loss[loss=0.08243, beats_loss=0.01573, ecapa_loss=0.0002188, whisper_loss=0.06451, over 14048.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01141, ecapa_loss=0.0002264, whisper_loss=0.09394, over 2697492.75 frames. ], batch size: 57, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:05:43,088 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 20:05:44,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.68 vs. limit=15.0 2024-08-10 20:06:13,951 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 15 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-10 20:06:15,590 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 20:06:16,633 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 20:06:20,071 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 22 from LS+wenet, 9 from Vox, 23 fro AS 2024-08-10 20:06:20,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=727350.0, ans=0.04949747468305833 2024-08-10 20:06:31,007 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 20:06:35,850 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-10 20:06:51,790 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2024-08-10 20:06:53,900 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 300, loss[loss=0.1199, beats_loss=0.0123, ecapa_loss=0.0002071, whisper_loss=0.1056, over 23056.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01156, ecapa_loss=0.000224, whisper_loss=0.09335, over 2940370.25 frames. ], batch size: 93, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:07:11,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=727750.0, ans=0.125 2024-08-10 20:07:24,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=727850.0, ans=0.1 2024-08-10 20:07:42,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=727950.0, ans=0.0 2024-08-10 20:07:42,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=727950.0, ans=0.05 2024-08-10 20:07:45,456 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.770e+01 3.156e+01 3.793e+01 6.617e+01, threshold=6.313e+01, percent-clipped=1.0 2024-08-10 20:08:02,072 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 20:08:02,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=728050.0, ans=0.1 2024-08-10 20:08:02,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=728050.0, ans=0.1 2024-08-10 20:08:07,934 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 350, loss[loss=0.08412, beats_loss=0.01075, ecapa_loss=0.0002585, whisper_loss=0.07078, over 13133.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01152, ecapa_loss=0.0002244, whisper_loss=0.09376, over 3116959.21 frames. ], batch size: 55, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:08:08,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=728150.0, ans=0.1 2024-08-10 20:08:27,603 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 20:08:34,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=728250.0, ans=0.0 2024-08-10 20:08:36,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=728350.0, ans=0.1 2024-08-10 20:08:58,927 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=12.0 2024-08-10 20:09:05,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=728550.0, ans=0.0 2024-08-10 20:09:10,040 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 20:09:21,156 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 400, loss[loss=0.1052, beats_loss=0.01149, ecapa_loss=0.0002374, whisper_loss=0.09137, over 17898.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01159, ecapa_loss=0.0002227, whisper_loss=0.0935, over 3281465.91 frames. ], batch size: 68, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:09:27,911 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.17 vs. limit=10.0 2024-08-10 20:09:43,352 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 27 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 20:09:50,463 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 20:09:58,070 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 20:10:03,626 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 20:10:12,490 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.814e+01 3.145e+01 3.714e+01 1.358e+02, threshold=6.291e+01, percent-clipped=2.0 2024-08-10 20:10:12,603 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 20:10:13,992 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 20:10:32,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=729150.0, ans=0.125 2024-08-10 20:10:33,529 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 450, loss[loss=0.1327, beats_loss=0.008732, ecapa_loss=0.0002694, whisper_loss=0.1212, over 22600.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01163, ecapa_loss=0.0002216, whisper_loss=0.09355, over 3395116.64 frames. ], batch size: 89, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:10:55,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=729250.0, ans=0.1 2024-08-10 20:11:29,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=729450.0, ans=0.0 2024-08-10 20:11:29,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=729450.0, ans=0.0 2024-08-10 20:11:33,731 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.05 vs. limit=15.0 2024-08-10 20:11:34,404 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 20:11:47,392 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 500, loss[loss=0.1087, beats_loss=0.0103, ecapa_loss=0.0002455, whisper_loss=0.09597, over 18375.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01154, ecapa_loss=0.0002219, whisper_loss=0.09373, over 3462581.81 frames. ], batch size: 74, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:11:52,491 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 35 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 20:11:55,767 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-10 20:12:09,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=729750.0, ans=0.125 2024-08-10 20:12:23,857 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=12.0 2024-08-10 20:12:41,813 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.724e+01 3.066e+01 3.405e+01 6.797e+01, threshold=6.131e+01, percent-clipped=1.0 2024-08-10 20:12:47,072 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2024-08-10 20:13:02,848 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 550, loss[loss=0.08502, beats_loss=0.01154, ecapa_loss=0.0002023, whisper_loss=0.07146, over 18310.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01154, ecapa_loss=0.0002209, whisper_loss=0.09402, over 3576345.17 frames. ], batch size: 72, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:13:05,251 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.43 vs. limit=22.5 2024-08-10 20:13:15,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=730150.0, ans=0.1 2024-08-10 20:13:21,039 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 20:13:37,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=730350.0, ans=0.125 2024-08-10 20:13:44,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=730350.0, ans=0.0 2024-08-10 20:14:07,716 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 20:14:12,175 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 20:14:13,530 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.68 vs. limit=10.0 2024-08-10 20:14:16,884 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 20:14:26,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=730550.0, ans=0.0 2024-08-10 20:14:41,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=730650.0, ans=0.125 2024-08-10 20:14:42,505 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 600, loss[loss=0.1051, beats_loss=0.01219, ecapa_loss=0.000153, whisper_loss=0.09139, over 15919.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01148, ecapa_loss=0.0002191, whisper_loss=0.09384, over 3598588.73 frames. ], batch size: 57, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:14:43,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=730650.0, ans=0.125 2024-08-10 20:14:46,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=730650.0, ans=0.125 2024-08-10 20:15:14,138 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.43 vs. limit=10.0 2024-08-10 20:15:36,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=730950.0, ans=0.0 2024-08-10 20:15:37,496 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.552e+01 2.834e+01 3.243e+01 4.859e+01, threshold=5.668e+01, percent-clipped=0.0 2024-08-10 20:15:42,003 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 20:15:48,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=731050.0, ans=0.5 2024-08-10 20:16:01,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=731050.0, ans=0.05 2024-08-10 20:16:04,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=731050.0, ans=0.125 2024-08-10 20:16:08,011 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 650, loss[loss=0.1345, beats_loss=0.009922, ecapa_loss=0.0002486, whisper_loss=0.122, over 22793.00 frames. ], tot_loss[loss=0.107, beats_loss=0.0116, ecapa_loss=0.00022, whisper_loss=0.09322, over 3643389.50 frames. ], batch size: 91, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:16:14,915 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 20:16:25,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=731150.0, ans=0.125 2024-08-10 20:16:25,543 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2024-08-10 20:16:51,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=731350.0, ans=0.1 2024-08-10 20:16:51,913 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.19 vs. limit=15.0 2024-08-10 20:17:38,987 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-08-10 20:17:49,429 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 35 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 20:17:50,852 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 700, loss[loss=0.1155, beats_loss=0.01006, ecapa_loss=0.0001887, whisper_loss=0.1035, over 22625.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01156, ecapa_loss=0.0002205, whisper_loss=0.09322, over 3651502.40 frames. ], batch size: 87, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:18:02,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=731650.0, ans=0.0 2024-08-10 20:19:11,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=731950.0, ans=0.1 2024-08-10 20:19:15,604 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.666e+01 3.015e+01 3.385e+01 4.873e+01, threshold=6.030e+01, percent-clipped=0.0 2024-08-10 20:19:37,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=732050.0, ans=0.125 2024-08-10 20:19:46,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=732050.0, ans=0.0 2024-08-10 20:19:49,636 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 750, loss[loss=0.09629, beats_loss=0.008385, ecapa_loss=0.0002712, whisper_loss=0.0852, over 13784.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01156, ecapa_loss=0.0002174, whisper_loss=0.09341, over 3691316.82 frames. ], batch size: 57, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:19:51,039 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 20:20:27,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=732250.0, ans=0.125 2024-08-10 20:20:58,177 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 20:20:58,453 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.041e-01 2024-08-10 20:21:19,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=732450.0, ans=0.125 2024-08-10 20:21:29,339 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 23 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-10 20:21:32,436 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 21 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-10 20:21:48,273 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 800, loss[loss=0.09212, beats_loss=0.01201, ecapa_loss=0.0002171, whisper_loss=0.07794, over 20423.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.0115, ecapa_loss=0.0002187, whisper_loss=0.09407, over 3723437.07 frames. ], batch size: 82, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:22:08,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=732650.0, ans=0.1 2024-08-10 20:22:34,879 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 32 from Vox, 28 fro AS 2024-08-10 20:22:41,078 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 20:22:42,278 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-10 20:22:57,673 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 20:23:13,812 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.807e+01 3.275e+01 3.755e+01 8.468e+01, threshold=6.551e+01, percent-clipped=2.0 2024-08-10 20:23:26,962 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 20:23:35,301 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=15.0 2024-08-10 20:23:43,182 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 850, loss[loss=0.08661, beats_loss=0.009843, ecapa_loss=0.0001979, whisper_loss=0.07478, over 14570.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01154, ecapa_loss=0.000219, whisper_loss=0.09314, over 3739912.04 frames. ], batch size: 56, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:23:46,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=733150.0, ans=0.125 2024-08-10 20:23:48,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=733150.0, ans=0.1 2024-08-10 20:24:07,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=733250.0, ans=0.0 2024-08-10 20:24:25,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=733350.0, ans=0.125 2024-08-10 20:24:35,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=733450.0, ans=0.04949747468305833 2024-08-10 20:24:38,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=733450.0, ans=0.125 2024-08-10 20:24:40,574 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 20:24:43,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=733450.0, ans=0.05 2024-08-10 20:25:09,151 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 900, loss[loss=0.1074, beats_loss=0.0142, ecapa_loss=0.000198, whisper_loss=0.09121, over 22892.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.0116, ecapa_loss=0.0002183, whisper_loss=0.09341, over 3796792.97 frames. ], batch size: 91, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:25:28,310 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2024-08-10 20:26:12,750 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.731e+01 3.012e+01 3.536e+01 7.102e+01, threshold=6.024e+01, percent-clipped=1.0 2024-08-10 20:26:20,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=734050.0, ans=0.125 2024-08-10 20:26:24,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=734050.0, ans=0.2 2024-08-10 20:26:27,032 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 20:26:29,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=734050.0, ans=0.07 2024-08-10 20:26:30,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=734050.0, ans=0.95 2024-08-10 20:26:38,400 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 950, loss[loss=0.09281, beats_loss=0.01226, ecapa_loss=0.0002018, whisper_loss=0.07853, over 17217.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01158, ecapa_loss=0.0002165, whisper_loss=0.09323, over 3782164.80 frames. ], batch size: 67, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:26:53,257 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-10 20:26:59,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=734250.0, ans=0.125 2024-08-10 20:27:01,318 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 22 from LS+wenet, 9 from Vox, 23 fro AS 2024-08-10 20:27:10,047 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 20:27:11,764 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 20:27:20,168 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.77 vs. limit=22.5 2024-08-10 20:27:29,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=734450.0, ans=0.1 2024-08-10 20:27:29,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=734450.0, ans=0.1 2024-08-10 20:27:42,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=734450.0, ans=0.125 2024-08-10 20:27:47,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=734550.0, ans=0.125 2024-08-10 20:27:49,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=734550.0, ans=0.125 2024-08-10 20:27:50,215 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 19 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-10 20:27:55,623 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2024-08-10 20:28:01,511 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1000, loss[loss=0.1152, beats_loss=0.009947, ecapa_loss=0.0002181, whisper_loss=0.103, over 18903.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01168, ecapa_loss=0.0002177, whisper_loss=0.09208, over 3787576.21 frames. ], batch size: 75, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:28:05,050 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 20:28:12,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=734650.0, ans=0.125 2024-08-10 20:28:14,341 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.48 vs. limit=15.0 2024-08-10 20:28:15,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=734650.0, ans=0.5 2024-08-10 20:28:27,918 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.30 vs. limit=6.0 2024-08-10 20:28:29,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=734750.0, ans=0.0 2024-08-10 20:28:36,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=12.0 2024-08-10 20:28:53,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=734950.0, ans=0.0 2024-08-10 20:28:56,021 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.04 vs. limit=15.0 2024-08-10 20:29:00,008 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.726e+01 3.092e+01 3.601e+01 1.041e+02, threshold=6.184e+01, percent-clipped=1.0 2024-08-10 20:29:06,285 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 20:29:11,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=735050.0, ans=0.1 2024-08-10 20:29:17,063 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.94 vs. limit=15.0 2024-08-10 20:29:19,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=735050.0, ans=0.0 2024-08-10 20:29:25,688 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1050, loss[loss=0.1028, beats_loss=0.01209, ecapa_loss=0.0001868, whisper_loss=0.08881, over 18157.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0116, ecapa_loss=0.0002182, whisper_loss=0.09276, over 3768496.74 frames. ], batch size: 68, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:29:50,900 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 20:30:29,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735450.0, ans=0.1 2024-08-10 20:30:40,674 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2024-08-10 20:30:41,614 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 20:30:45,534 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 20:30:50,956 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 20:30:51,904 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1100, loss[loss=0.1133, beats_loss=0.01192, ecapa_loss=0.0002108, whisper_loss=0.09931, over 22171.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01162, ecapa_loss=0.0002171, whisper_loss=0.09268, over 3772424.97 frames. ], batch size: 88, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:31:31,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=735850.0, ans=0.2 2024-08-10 20:31:36,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=735850.0, ans=0.2 2024-08-10 20:31:50,794 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.214e+01 2.770e+01 3.006e+01 3.661e+01 6.910e+01, threshold=6.012e+01, percent-clipped=1.0 2024-08-10 20:32:15,953 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1150, loss[loss=0.1043, beats_loss=0.01078, ecapa_loss=0.0002149, whisper_loss=0.09141, over 18832.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01157, ecapa_loss=0.0002165, whisper_loss=0.09311, over 3773362.68 frames. ], batch size: 74, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:32:31,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=736150.0, ans=0.0 2024-08-10 20:32:50,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=736250.0, ans=0.125 2024-08-10 20:32:55,360 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 20:33:22,176 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 22 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-10 20:33:27,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=736550.0, ans=0.125 2024-08-10 20:33:27,996 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 20:33:39,674 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1200, loss[loss=0.1184, beats_loss=0.01296, ecapa_loss=0.0001983, whisper_loss=0.1035, over 19435.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01153, ecapa_loss=0.0002158, whisper_loss=0.09312, over 3775206.35 frames. ], batch size: 76, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:33:59,391 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 20:34:07,086 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 20:34:10,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=736850.0, ans=0.09899494936611666 2024-08-10 20:34:33,342 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.774e+01 3.140e+01 3.554e+01 5.402e+01, threshold=6.279e+01, percent-clipped=0.0 2024-08-10 20:34:37,174 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.26 vs. limit=22.5 2024-08-10 20:34:41,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=737050.0, ans=0.125 2024-08-10 20:34:57,085 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1250, loss[loss=0.1145, beats_loss=0.01395, ecapa_loss=0.000213, whisper_loss=0.09837, over 22395.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01151, ecapa_loss=0.0002151, whisper_loss=0.0937, over 3777293.88 frames. ], batch size: 92, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:35:24,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=737250.0, ans=0.125 2024-08-10 20:35:30,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=737350.0, ans=0.0 2024-08-10 20:35:34,898 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 20:35:36,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=737350.0, ans=0.1 2024-08-10 20:35:38,351 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-10 20:36:12,449 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1300, loss[loss=0.1081, beats_loss=0.01237, ecapa_loss=0.0001892, whisper_loss=0.09385, over 23565.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01156, ecapa_loss=0.000214, whisper_loss=0.09309, over 3778915.57 frames. ], batch size: 90, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:36:22,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=737650.0, ans=0.1 2024-08-10 20:36:23,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=737650.0, ans=0.0 2024-08-10 20:36:32,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=737750.0, ans=0.125 2024-08-10 20:36:47,167 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.45 vs. limit=15.0 2024-08-10 20:36:49,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=737850.0, ans=0.015 2024-08-10 20:36:53,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=737850.0, ans=0.125 2024-08-10 20:36:53,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=737850.0, ans=0.0 2024-08-10 20:36:58,032 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 20:37:03,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.62 vs. limit=15.0 2024-08-10 20:37:08,879 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.805e+01 3.070e+01 3.591e+01 5.506e+01, threshold=6.140e+01, percent-clipped=0.0 2024-08-10 20:37:20,077 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.37 vs. limit=6.0 2024-08-10 20:37:34,006 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1350, loss[loss=0.1084, beats_loss=0.01085, ecapa_loss=0.0002268, whisper_loss=0.09529, over 22747.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.0115, ecapa_loss=0.0002149, whisper_loss=0.09321, over 3792478.24 frames. ], batch size: 90, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:37:51,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=738250.0, ans=0.0 2024-08-10 20:38:03,514 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-10 20:38:06,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=738350.0, ans=0.125 2024-08-10 20:38:07,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=738350.0, ans=0.0 2024-08-10 20:38:39,097 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 20:38:39,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=738550.0, ans=0.125 2024-08-10 20:38:41,870 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 20:38:43,794 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 20:38:46,810 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 20:38:51,479 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 20:38:56,049 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1400, loss[loss=0.1273, beats_loss=0.01019, ecapa_loss=0.000207, whisper_loss=0.1151, over 23993.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01146, ecapa_loss=0.0002146, whisper_loss=0.0943, over 3793329.62 frames. ], batch size: 93, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:39:52,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=738950.0, ans=0.0 2024-08-10 20:39:56,924 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.641e+01 2.966e+01 3.393e+01 5.160e+01, threshold=5.932e+01, percent-clipped=0.0 2024-08-10 20:39:57,086 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 20:40:03,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=739050.0, ans=10.0 2024-08-10 20:40:05,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=739050.0, ans=0.125 2024-08-10 20:40:16,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=739050.0, ans=0.2 2024-08-10 20:40:23,745 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1450, loss[loss=0.12, beats_loss=0.01202, ecapa_loss=0.0001966, whisper_loss=0.106, over 15760.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01155, ecapa_loss=0.000214, whisper_loss=0.09378, over 3776919.15 frames. ], batch size: 57, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:41:06,758 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 20:41:26,459 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 20:41:42,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=739350.0, ans=0.0 2024-08-10 20:42:05,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=739550.0, ans=0.125 2024-08-10 20:42:18,437 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1500, loss[loss=0.1127, beats_loss=0.01193, ecapa_loss=0.0001976, whisper_loss=0.09876, over 20542.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01163, ecapa_loss=0.0002141, whisper_loss=0.09337, over 3798246.22 frames. ], batch size: 83, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:42:18,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=739650.0, ans=0.125 2024-08-10 20:42:37,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=739750.0, ans=0.125 2024-08-10 20:43:01,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=739850.0, ans=0.05 2024-08-10 20:43:06,771 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 20:43:14,431 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.729e+01 3.073e+01 3.413e+01 6.253e+01, threshold=6.146e+01, percent-clipped=1.0 2024-08-10 20:43:14,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=739950.0, ans=0.125 2024-08-10 20:43:38,806 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1550, loss[loss=0.1147, beats_loss=0.01178, ecapa_loss=0.0001689, whisper_loss=0.1013, over 19897.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.0116, ecapa_loss=0.0002142, whisper_loss=0.09319, over 3771047.31 frames. ], batch size: 78, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:43:41,551 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=12.0 2024-08-10 20:43:57,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=740250.0, ans=0.125 2024-08-10 20:44:22,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=740350.0, ans=0.125 2024-08-10 20:44:22,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=740350.0, ans=0.125 2024-08-10 20:44:59,030 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2024-08-10 20:45:00,489 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1600, loss[loss=0.08876, beats_loss=0.01149, ecapa_loss=0.0001899, whisper_loss=0.07537, over 15718.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01161, ecapa_loss=0.0002123, whisper_loss=0.09301, over 3785477.28 frames. ], batch size: 63, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:45:04,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=740650.0, ans=0.2 2024-08-10 20:45:10,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=740650.0, ans=0.1 2024-08-10 20:45:39,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=740850.0, ans=0.0 2024-08-10 20:45:54,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=740950.0, ans=0.0 2024-08-10 20:45:58,561 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.574e+01 2.929e+01 3.457e+01 5.264e+01, threshold=5.858e+01, percent-clipped=0.0 2024-08-10 20:46:14,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=741050.0, ans=0.1 2024-08-10 20:46:17,508 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2024-08-10 20:46:23,211 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1650, loss[loss=0.1083, beats_loss=0.01087, ecapa_loss=0.0001986, whisper_loss=0.09546, over 20126.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01152, ecapa_loss=0.0002115, whisper_loss=0.09441, over 3832742.71 frames. ], batch size: 72, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:46:24,798 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-10 20:46:29,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=741150.0, ans=0.125 2024-08-10 20:46:43,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=741250.0, ans=0.125 2024-08-10 20:46:45,988 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 20:46:54,757 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 10 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 20:47:01,861 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 20:47:03,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=741350.0, ans=0.125 2024-08-10 20:47:19,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=741450.0, ans=0.0 2024-08-10 20:47:32,711 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2024-08-10 20:47:35,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=741550.0, ans=10.0 2024-08-10 20:47:40,728 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1700, loss[loss=0.1196, beats_loss=0.01167, ecapa_loss=0.0002092, whisper_loss=0.1058, over 18695.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01146, ecapa_loss=0.0002111, whisper_loss=0.09447, over 3831725.27 frames. ], batch size: 70, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:47:44,537 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-10 20:47:56,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=741750.0, ans=0.125 2024-08-10 20:48:03,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=741750.0, ans=0.0 2024-08-10 20:48:08,800 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 20:48:31,074 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-10 20:48:34,139 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.187e+01 2.737e+01 3.042e+01 3.583e+01 5.597e+01, threshold=6.084e+01, percent-clipped=0.0 2024-08-10 20:48:39,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=741950.0, ans=0.1 2024-08-10 20:48:49,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=742050.0, ans=0.125 2024-08-10 20:48:51,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=742050.0, ans=0.125 2024-08-10 20:48:56,010 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1750, loss[loss=0.1251, beats_loss=0.009033, ecapa_loss=0.0002417, whisper_loss=0.1136, over 20575.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01146, ecapa_loss=0.0002124, whisper_loss=0.09459, over 3843225.31 frames. ], batch size: 82, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:49:05,977 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2024-08-10 20:49:09,248 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.12 vs. limit=15.0 2024-08-10 20:49:10,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=742250.0, ans=0.2 2024-08-10 20:49:20,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=742250.0, ans=0.0 2024-08-10 20:49:41,038 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-10 20:49:46,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=742450.0, ans=0.125 2024-08-10 20:49:55,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=742550.0, ans=0.0 2024-08-10 20:50:00,379 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 20:50:11,714 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1800, loss[loss=0.1019, beats_loss=0.01242, ecapa_loss=0.0002676, whisper_loss=0.08685, over 18023.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01153, ecapa_loss=0.0002105, whisper_loss=0.0936, over 3834593.23 frames. ], batch size: 75, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:50:15,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=742650.0, ans=0.0 2024-08-10 20:50:24,475 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.22 vs. limit=22.5 2024-08-10 20:50:25,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=742750.0, ans=10.0 2024-08-10 20:50:30,040 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.59 vs. limit=10.0 2024-08-10 20:50:32,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=742750.0, ans=0.125 2024-08-10 20:50:44,208 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-10 20:50:51,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.44 vs. limit=6.0 2024-08-10 20:50:53,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2024-08-10 20:50:59,366 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 20:51:01,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=742950.0, ans=0.0 2024-08-10 20:51:03,024 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 20:51:05,893 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.668e+01 3.016e+01 3.512e+01 6.004e+01, threshold=6.033e+01, percent-clipped=0.0 2024-08-10 20:51:29,690 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1850, loss[loss=0.08538, beats_loss=0.01036, ecapa_loss=0.0002449, whisper_loss=0.07257, over 16058.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01147, ecapa_loss=0.0002119, whisper_loss=0.09407, over 3828954.13 frames. ], batch size: 62, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:51:50,262 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 20:51:54,249 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.06 vs. limit=8.0 2024-08-10 20:52:04,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=743350.0, ans=0.0 2024-08-10 20:52:16,028 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 20:52:20,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=743450.0, ans=0.0 2024-08-10 20:52:29,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=743550.0, ans=0.1 2024-08-10 20:52:35,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=743550.0, ans=0.0 2024-08-10 20:52:39,935 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 20:52:43,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743650.0, ans=0.1 2024-08-10 20:52:44,291 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1900, loss[loss=0.1185, beats_loss=0.01044, ecapa_loss=0.0002057, whisper_loss=0.106, over 14557.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01146, ecapa_loss=0.0002139, whisper_loss=0.0944, over 3810787.92 frames. ], batch size: 55, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:52:50,246 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 20:52:57,759 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-10 20:53:05,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=743750.0, ans=0.125 2024-08-10 20:53:12,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=743750.0, ans=0.125 2024-08-10 20:53:27,929 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-10 20:53:36,520 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.690e+01 3.145e+01 3.666e+01 6.863e+01, threshold=6.290e+01, percent-clipped=1.0 2024-08-10 20:54:01,671 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 1950, loss[loss=0.1133, beats_loss=0.01246, ecapa_loss=0.0001827, whisper_loss=0.09901, over 22968.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01145, ecapa_loss=0.0002177, whisper_loss=0.09449, over 3768568.23 frames. ], batch size: 89, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:54:07,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=744150.0, ans=0.0 2024-08-10 20:54:23,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=744250.0, ans=0.125 2024-08-10 20:54:34,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=744350.0, ans=0.125 2024-08-10 20:54:38,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=744350.0, ans=0.125 2024-08-10 20:54:38,354 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.952e-01 2024-08-10 20:54:53,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=744450.0, ans=0.125 2024-08-10 20:55:05,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=744550.0, ans=0.0 2024-08-10 20:55:18,457 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2000, loss[loss=0.09765, beats_loss=0.009399, ecapa_loss=0.0002481, whisper_loss=0.08577, over 18974.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01137, ecapa_loss=0.0002208, whisper_loss=0.09477, over 3755932.35 frames. ], batch size: 75, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:55:25,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744650.0, ans=0.1 2024-08-10 20:55:55,037 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 20:55:58,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=744850.0, ans=0.1 2024-08-10 20:56:06,330 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 20:56:10,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=744950.0, ans=0.2 2024-08-10 20:56:16,448 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.753e+01 3.103e+01 3.441e+01 5.353e+01, threshold=6.205e+01, percent-clipped=0.0 2024-08-10 20:56:16,611 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 20:56:19,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=744950.0, ans=0.0 2024-08-10 20:56:42,171 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2050, loss[loss=0.1043, beats_loss=0.01251, ecapa_loss=0.0001612, whisper_loss=0.09017, over 21897.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01146, ecapa_loss=0.000222, whisper_loss=0.09469, over 3774972.59 frames. ], batch size: 85, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:56:46,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=745150.0, ans=0.125 2024-08-10 20:56:56,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=745250.0, ans=0.125 2024-08-10 20:57:13,622 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 20:57:28,774 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 20:57:45,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=745550.0, ans=0.125 2024-08-10 20:57:57,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=745550.0, ans=0.0 2024-08-10 20:58:02,662 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2100, loss[loss=0.1154, beats_loss=0.01023, ecapa_loss=0.0001736, whisper_loss=0.1034, over 15407.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01147, ecapa_loss=0.0002226, whisper_loss=0.09442, over 3766453.65 frames. ], batch size: 59, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:58:09,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=745650.0, ans=0.125 2024-08-10 20:58:11,514 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 20:58:19,225 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.93 vs. limit=15.0 2024-08-10 20:58:40,991 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-10 20:58:56,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=745950.0, ans=0.125 2024-08-10 20:59:01,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=745950.0, ans=0.125 2024-08-10 20:59:03,048 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-10 20:59:06,530 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.289e+01 2.787e+01 3.226e+01 3.870e+01 7.991e+01, threshold=6.452e+01, percent-clipped=3.0 2024-08-10 20:59:19,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=746050.0, ans=0.015 2024-08-10 20:59:31,394 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2150, loss[loss=0.1116, beats_loss=0.01048, ecapa_loss=0.0002103, whisper_loss=0.09902, over 21882.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01145, ecapa_loss=0.0002245, whisper_loss=0.09448, over 3789853.19 frames. ], batch size: 83, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:59:40,520 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-10 20:59:44,672 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-10 20:59:57,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=746250.0, ans=0.125 2024-08-10 21:00:18,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=746350.0, ans=0.0 2024-08-10 21:00:30,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=746450.0, ans=0.125 2024-08-10 21:00:57,251 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2200, loss[loss=0.1005, beats_loss=0.01162, ecapa_loss=0.0002288, whisper_loss=0.08661, over 19417.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01146, ecapa_loss=0.0002235, whisper_loss=0.09482, over 3788589.09 frames. ], batch size: 77, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:01:19,963 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-08-10 21:01:25,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=746750.0, ans=0.125 2024-08-10 21:01:29,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=746750.0, ans=0.2 2024-08-10 21:01:37,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=746850.0, ans=0.02 2024-08-10 21:01:37,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=746850.0, ans=0.02 2024-08-10 21:01:59,205 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.667e+01 3.183e+01 3.944e+01 1.052e+02, threshold=6.365e+01, percent-clipped=1.0 2024-08-10 21:02:14,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=747050.0, ans=0.0 2024-08-10 21:02:24,530 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2250, loss[loss=0.1162, beats_loss=0.01126, ecapa_loss=0.0002478, whisper_loss=0.1025, over 18771.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01152, ecapa_loss=0.000225, whisper_loss=0.0948, over 3803960.46 frames. ], batch size: 76, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:02:27,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747150.0, ans=0.1 2024-08-10 21:02:30,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=747150.0, ans=0.07 2024-08-10 21:02:36,510 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 21:02:50,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=747250.0, ans=0.125 2024-08-10 21:03:09,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=747350.0, ans=0.125 2024-08-10 21:03:32,064 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2024-08-10 21:03:33,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=747550.0, ans=0.125 2024-08-10 21:03:33,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=747550.0, ans=0.125 2024-08-10 21:03:51,110 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2300, loss[loss=0.09529, beats_loss=0.01392, ecapa_loss=0.0001854, whisper_loss=0.07952, over 21315.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01161, ecapa_loss=0.0002249, whisper_loss=0.09445, over 3853699.58 frames. ], batch size: 82, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:04:13,091 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.52 vs. limit=12.0 2024-08-10 21:04:16,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747750.0, ans=0.1 2024-08-10 21:04:24,962 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2024-08-10 21:04:26,161 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-10 21:04:41,340 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 21:04:53,102 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.764e+01 3.059e+01 3.552e+01 5.257e+01, threshold=6.118e+01, percent-clipped=0.0 2024-08-10 21:04:58,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=747950.0, ans=0.125 2024-08-10 21:04:59,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=748050.0, ans=0.0 2024-08-10 21:05:16,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=748050.0, ans=0.0 2024-08-10 21:05:19,919 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2350, loss[loss=0.1218, beats_loss=0.009591, ecapa_loss=0.0002005, whisper_loss=0.1102, over 14665.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01151, ecapa_loss=0.0002243, whisper_loss=0.09535, over 3848370.42 frames. ], batch size: 54, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:05:47,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=748250.0, ans=0.125 2024-08-10 21:05:56,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748250.0, ans=0.1 2024-08-10 21:06:07,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748350.0, ans=0.1 2024-08-10 21:07:08,497 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2400, loss[loss=0.1094, beats_loss=0.0126, ecapa_loss=0.0002021, whisper_loss=0.09477, over 18483.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01158, ecapa_loss=0.0002233, whisper_loss=0.09461, over 3849763.75 frames. ], batch size: 71, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:07:09,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=748650.0, ans=0.125 2024-08-10 21:07:20,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=748650.0, ans=0.125 2024-08-10 21:07:22,756 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 21:07:47,580 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 27 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-10 21:07:53,710 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 21:08:02,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748850.0, ans=0.1 2024-08-10 21:08:05,195 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 21:08:22,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=748850.0, ans=0.125 2024-08-10 21:08:33,000 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 21:08:40,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=748950.0, ans=0.125 2024-08-10 21:08:47,480 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.712e+01 3.107e+01 3.563e+01 2.420e+02, threshold=6.213e+01, percent-clipped=2.0 2024-08-10 21:08:51,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=748950.0, ans=0.0 2024-08-10 21:09:29,384 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2450, loss[loss=0.1065, beats_loss=0.0127, ecapa_loss=0.0001951, whisper_loss=0.0919, over 21698.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01152, ecapa_loss=0.0002239, whisper_loss=0.09464, over 3839765.68 frames. ], batch size: 83, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:09:46,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=749150.0, ans=0.0 2024-08-10 21:10:25,161 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-10 21:10:27,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=749450.0, ans=0.125 2024-08-10 21:10:56,004 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 21:11:01,068 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2500, loss[loss=0.1156, beats_loss=0.01217, ecapa_loss=0.00018, whisper_loss=0.1017, over 18305.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01148, ecapa_loss=0.0002235, whisper_loss=0.09502, over 3844690.90 frames. ], batch size: 70, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:11:01,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=749650.0, ans=0.125 2024-08-10 21:11:02,860 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 21:11:03,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=749650.0, ans=0.125 2024-08-10 21:11:09,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=749650.0, ans=0.05 2024-08-10 21:11:26,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=749750.0, ans=0.0 2024-08-10 21:11:30,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=749750.0, ans=0.0 2024-08-10 21:11:31,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=749750.0, ans=0.0 2024-08-10 21:12:03,313 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.73 vs. limit=22.5 2024-08-10 21:12:03,912 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+01 2.786e+01 3.132e+01 3.631e+01 5.389e+01, threshold=6.264e+01, percent-clipped=0.0 2024-08-10 21:12:07,164 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.20 vs. limit=15.0 2024-08-10 21:12:17,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=750050.0, ans=0.125 2024-08-10 21:12:32,559 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2550, loss[loss=0.09378, beats_loss=0.01539, ecapa_loss=0.0002064, whisper_loss=0.07632, over 13594.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01153, ecapa_loss=0.0002238, whisper_loss=0.09483, over 3848739.20 frames. ], batch size: 57, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:12:47,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=750150.0, ans=0.125 2024-08-10 21:12:48,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=750250.0, ans=0.0 2024-08-10 21:13:46,059 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.45 vs. limit=12.0 2024-08-10 21:13:49,623 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-10 21:13:53,667 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.34 vs. limit=22.5 2024-08-10 21:14:07,933 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2600, loss[loss=0.1043, beats_loss=0.01014, ecapa_loss=0.0002719, whisper_loss=0.09147, over 20754.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01153, ecapa_loss=0.0002242, whisper_loss=0.09468, over 3851760.31 frames. ], batch size: 88, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:14:08,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=750650.0, ans=0.125 2024-08-10 21:14:47,938 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.11 vs. limit=15.0 2024-08-10 21:15:09,706 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.143e+01 2.826e+01 3.235e+01 3.900e+01 8.164e+01, threshold=6.470e+01, percent-clipped=1.0 2024-08-10 21:15:31,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=751050.0, ans=0.125 2024-08-10 21:15:33,789 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2650, loss[loss=0.1007, beats_loss=0.01002, ecapa_loss=0.0002841, whisper_loss=0.08779, over 17050.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01155, ecapa_loss=0.0002242, whisper_loss=0.09471, over 3833775.79 frames. ], batch size: 70, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:15:45,537 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 38 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 21:15:45,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=751150.0, ans=0.125 2024-08-10 21:15:49,508 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 21:15:51,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=751250.0, ans=0.125 2024-08-10 21:16:03,125 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 21:16:03,385 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 21:16:07,669 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 21:16:16,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=751350.0, ans=0.1 2024-08-10 21:16:34,202 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 21:16:45,603 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 21:16:53,903 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=71.57 vs. limit=22.5 2024-08-10 21:17:02,387 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2700, loss[loss=0.1016, beats_loss=0.009936, ecapa_loss=0.0002325, whisper_loss=0.08938, over 16134.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01155, ecapa_loss=0.0002239, whisper_loss=0.09431, over 3840911.52 frames. ], batch size: 61, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:17:07,690 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 21:17:09,375 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 27 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 21:17:09,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=751650.0, ans=0.0 2024-08-10 21:18:03,077 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 3.016e+01 3.341e+01 3.971e+01 1.144e+02, threshold=6.682e+01, percent-clipped=3.0 2024-08-10 21:18:07,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=751950.0, ans=0.0 2024-08-10 21:18:15,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=752050.0, ans=0.0 2024-08-10 21:18:28,261 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2750, loss[loss=0.09216, beats_loss=0.01191, ecapa_loss=0.0001842, whisper_loss=0.0784, over 16415.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01154, ecapa_loss=0.000225, whisper_loss=0.09476, over 3856941.87 frames. ], batch size: 60, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:18:34,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=752150.0, ans=0.125 2024-08-10 21:18:47,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=752250.0, ans=0.0 2024-08-10 21:19:01,043 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 21:19:02,820 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 21:19:06,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=752350.0, ans=0.0 2024-08-10 21:19:30,670 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 21:19:33,306 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 21:19:47,445 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 31 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 21:19:53,675 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2024-08-10 21:19:54,844 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2800, loss[loss=0.09976, beats_loss=0.01101, ecapa_loss=0.0002528, whisper_loss=0.08621, over 17073.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01161, ecapa_loss=0.0002237, whisper_loss=0.09489, over 3860659.78 frames. ], batch size: 68, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:19:57,274 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 21:20:16,421 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-08-10 21:20:42,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=752950.0, ans=0.0 2024-08-10 21:20:50,462 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 21:20:53,222 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.093e+01 2.721e+01 3.078e+01 3.353e+01 6.515e+01, threshold=6.156e+01, percent-clipped=0.0 2024-08-10 21:20:53,433 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 21:20:55,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=752950.0, ans=0.2 2024-08-10 21:21:02,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=753050.0, ans=0.1 2024-08-10 21:21:17,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=753050.0, ans=0.2 2024-08-10 21:21:19,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=753150.0, ans=0.0 2024-08-10 21:21:20,791 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2850, loss[loss=0.1259, beats_loss=0.01054, ecapa_loss=0.000196, whisper_loss=0.1134, over 18730.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.0117, ecapa_loss=0.0002245, whisper_loss=0.09418, over 3872383.42 frames. ], batch size: 73, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:21:37,910 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-10 21:21:38,709 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.97 vs. limit=15.0 2024-08-10 21:21:48,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=753250.0, ans=0.125 2024-08-10 21:21:56,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=753250.0, ans=0.125 2024-08-10 21:22:53,085 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2900, loss[loss=0.1225, beats_loss=0.01103, ecapa_loss=0.0002456, whisper_loss=0.109, over 23176.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.0116, ecapa_loss=0.0002251, whisper_loss=0.09488, over 3857017.30 frames. ], batch size: 91, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:22:53,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=753650.0, ans=0.0 2024-08-10 21:23:13,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=753750.0, ans=0.1 2024-08-10 21:23:53,387 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.758e+01 3.070e+01 3.678e+01 5.521e+01, threshold=6.141e+01, percent-clipped=0.0 2024-08-10 21:24:15,672 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 21:24:18,676 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 2950, loss[loss=0.1169, beats_loss=0.01096, ecapa_loss=0.0002183, whisper_loss=0.1038, over 16418.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01159, ecapa_loss=0.0002273, whisper_loss=0.09542, over 3883495.66 frames. ], batch size: 61, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:24:19,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=754150.0, ans=0.0 2024-08-10 21:24:22,022 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.33 vs. limit=15.0 2024-08-10 21:24:23,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=754150.0, ans=0.125 2024-08-10 21:24:51,382 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 34 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 21:24:51,849 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2024-08-10 21:24:53,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=754350.0, ans=0.05 2024-08-10 21:24:56,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=754350.0, ans=0.2 2024-08-10 21:25:00,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=754350.0, ans=0.0 2024-08-10 21:25:28,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=754550.0, ans=0.0 2024-08-10 21:25:31,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=754550.0, ans=0.1 2024-08-10 21:25:39,054 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3000, loss[loss=0.1021, beats_loss=0.01321, ecapa_loss=0.0002239, whisper_loss=0.08665, over 18548.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01153, ecapa_loss=0.0002271, whisper_loss=0.095, over 3885234.18 frames. ], batch size: 78, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:25:39,054 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-10 21:26:18,860 INFO [train_multi_KD3.py:1149] (2/4) Epoch 6, validation on ASR_libri: loss=0.2598, beats_loss=0, ecapa_loss=0.0007066, whisper_loss=0.2527, over 922467.00 frames. 2024-08-10 21:26:38,577 INFO [train_multi_KD3.py:1149] (2/4) Epoch 6, validation on SV_voxceleb1: loss=0.005938, beats_loss=0, ecapa_loss=0.0005938, whisper_loss=0, over 939242.00 frames. 2024-08-10 21:28:42,338 INFO [train_multi_KD3.py:1149] (2/4) Epoch 6, validation on AT_audioset: loss=0.02614, beats_loss=0.02614, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 21:28:42,342 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 21:29:00,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=754750.0, ans=0.125 2024-08-10 21:29:00,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=754750.0, ans=0.1 2024-08-10 21:29:28,645 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-10 21:29:34,076 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 21:29:38,559 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.874e+01 3.287e+01 3.873e+01 6.300e+01, threshold=6.573e+01, percent-clipped=1.0 2024-08-10 21:29:51,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=755050.0, ans=0.0 2024-08-10 21:30:02,262 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3050, loss[loss=0.08685, beats_loss=0.01058, ecapa_loss=0.0002718, whisper_loss=0.07355, over 14989.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01161, ecapa_loss=0.0002265, whisper_loss=0.09546, over 3908415.89 frames. ], batch size: 61, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:30:15,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=755150.0, ans=0.125 2024-08-10 21:30:17,927 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 21:30:22,390 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 21:30:30,465 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-10 21:30:46,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=755350.0, ans=0.125 2024-08-10 21:30:47,175 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2024-08-10 21:30:59,329 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 21:31:22,989 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3100, loss[loss=0.1113, beats_loss=0.01169, ecapa_loss=0.0002727, whisper_loss=0.09692, over 22022.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01157, ecapa_loss=0.0002263, whisper_loss=0.09595, over 3922548.30 frames. ], batch size: 92, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:31:23,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755650.0, ans=0.1 2024-08-10 21:31:23,864 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.65 vs. limit=12.0 2024-08-10 21:31:30,572 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 21:31:57,452 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2024-08-10 21:31:58,720 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 21:32:01,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=755850.0, ans=0.1 2024-08-10 21:32:05,878 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 21:32:07,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=755850.0, ans=0.125 2024-08-10 21:32:21,338 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.626e+01 2.939e+01 3.498e+01 4.571e+01, threshold=5.879e+01, percent-clipped=0.0 2024-08-10 21:32:27,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.60 vs. limit=15.0 2024-08-10 21:32:44,803 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3150, loss[loss=0.1005, beats_loss=0.01166, ecapa_loss=0.0002653, whisper_loss=0.08621, over 17892.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.0116, ecapa_loss=0.000226, whisper_loss=0.09525, over 3923279.28 frames. ], batch size: 76, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:32:48,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=756150.0, ans=0.2 2024-08-10 21:32:56,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=756150.0, ans=0.0 2024-08-10 21:32:57,392 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 21:33:03,332 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 21:33:12,346 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 40 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-10 21:33:17,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=756350.0, ans=0.2 2024-08-10 21:33:20,035 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-10 21:33:33,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=756450.0, ans=0.0 2024-08-10 21:33:36,034 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 21:33:54,518 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 21:34:04,962 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3200, loss[loss=0.12, beats_loss=0.0115, ecapa_loss=0.00023, whisper_loss=0.1062, over 15697.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01156, ecapa_loss=0.0002261, whisper_loss=0.09608, over 3902249.80 frames. ], batch size: 62, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:34:08,896 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 21:34:09,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=756650.0, ans=0.0 2024-08-10 21:34:18,304 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 21:34:28,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=756750.0, ans=0.0 2024-08-10 21:34:30,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=756750.0, ans=0.125 2024-08-10 21:34:41,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=756850.0, ans=0.125 2024-08-10 21:34:49,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=756850.0, ans=0.0 2024-08-10 21:34:56,169 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-10 21:35:03,117 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.765e+01 3.113e+01 3.844e+01 7.476e+01, threshold=6.225e+01, percent-clipped=4.0 2024-08-10 21:35:10,689 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 21:35:26,959 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3250, loss[loss=0.09201, beats_loss=0.01144, ecapa_loss=0.0002885, whisper_loss=0.07768, over 20442.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01146, ecapa_loss=0.0002262, whisper_loss=0.09663, over 3909149.28 frames. ], batch size: 87, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:35:44,253 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.20 vs. limit=12.0 2024-08-10 21:36:08,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=757350.0, ans=0.0 2024-08-10 21:36:19,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=757450.0, ans=0.125 2024-08-10 21:36:30,657 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.59 vs. limit=22.5 2024-08-10 21:36:34,764 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 21:36:50,141 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3300, loss[loss=0.107, beats_loss=0.01349, ecapa_loss=0.0002078, whisper_loss=0.09144, over 22703.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01158, ecapa_loss=0.0002261, whisper_loss=0.09549, over 3917389.88 frames. ], batch size: 90, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:37:15,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=757750.0, ans=0.125 2024-08-10 21:37:32,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=757850.0, ans=0.125 2024-08-10 21:37:35,659 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-10 21:37:42,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=757950.0, ans=0.125 2024-08-10 21:37:50,387 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.783e+01 3.080e+01 3.590e+01 5.176e+01, threshold=6.160e+01, percent-clipped=0.0 2024-08-10 21:37:50,638 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 21:37:52,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=757950.0, ans=0.0 2024-08-10 21:38:01,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=758050.0, ans=0.0 2024-08-10 21:38:08,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=758050.0, ans=0.2 2024-08-10 21:38:10,091 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2024-08-10 21:38:15,675 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3350, loss[loss=0.1254, beats_loss=0.01087, ecapa_loss=0.000208, whisper_loss=0.1125, over 22876.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01154, ecapa_loss=0.0002262, whisper_loss=0.09575, over 3928220.70 frames. ], batch size: 88, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:38:15,806 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 28 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-10 21:38:16,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=758150.0, ans=0.2 2024-08-10 21:38:27,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=758150.0, ans=0.1 2024-08-10 21:38:58,479 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-10 21:39:30,146 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 31 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 21:39:33,420 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3400, loss[loss=0.096, beats_loss=0.01127, ecapa_loss=0.0001889, whisper_loss=0.08284, over 22784.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01158, ecapa_loss=0.0002241, whisper_loss=0.09517, over 3941727.52 frames. ], batch size: 91, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:39:44,219 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-10 21:40:18,855 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 21:40:23,083 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 21:40:27,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=758950.0, ans=0.1 2024-08-10 21:40:32,009 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.745e+01 3.132e+01 3.636e+01 5.691e+01, threshold=6.264e+01, percent-clipped=0.0 2024-08-10 21:40:41,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=759050.0, ans=0.2 2024-08-10 21:40:56,033 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3450, loss[loss=0.129, beats_loss=0.008223, ecapa_loss=0.0002659, whisper_loss=0.1181, over 16437.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01155, ecapa_loss=0.0002246, whisper_loss=0.09468, over 3926069.48 frames. ], batch size: 62, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:41:07,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=759150.0, ans=0.0 2024-08-10 21:41:09,148 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 21:41:39,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=759350.0, ans=0.0 2024-08-10 21:41:52,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=759450.0, ans=0.125 2024-08-10 21:41:53,492 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 32 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 21:41:59,754 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2024-08-10 21:42:05,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=759550.0, ans=0.125 2024-08-10 21:42:19,389 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3500, loss[loss=0.1025, beats_loss=0.01397, ecapa_loss=0.0001771, whisper_loss=0.08673, over 22872.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01162, ecapa_loss=0.0002248, whisper_loss=0.09412, over 3903521.65 frames. ], batch size: 91, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:42:25,298 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 21:42:27,480 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-10 21:42:28,867 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.19 vs. limit=15.0 2024-08-10 21:42:29,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=759650.0, ans=0.0 2024-08-10 21:42:35,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=759750.0, ans=0.0 2024-08-10 21:42:44,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=759750.0, ans=0.1 2024-08-10 21:42:50,871 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.26 vs. limit=22.5 2024-08-10 21:43:11,879 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.666e+01 2.958e+01 3.304e+01 6.870e+01, threshold=5.915e+01, percent-clipped=1.0 2024-08-10 21:43:16,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=759950.0, ans=0.0 2024-08-10 21:43:26,736 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.49 vs. limit=15.0 2024-08-10 21:43:29,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=760050.0, ans=0.2 2024-08-10 21:43:31,727 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3550, loss[loss=0.1014, beats_loss=0.01289, ecapa_loss=0.0002534, whisper_loss=0.08594, over 20466.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01164, ecapa_loss=0.0002254, whisper_loss=0.09364, over 3886934.96 frames. ], batch size: 90, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:43:49,409 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.15 vs. limit=22.5 2024-08-10 21:44:19,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=760450.0, ans=0.125 2024-08-10 21:44:20,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=760450.0, ans=0.2 2024-08-10 21:44:36,910 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3600, loss[loss=0.08831, beats_loss=0.01306, ecapa_loss=0.0001927, whisper_loss=0.07333, over 19277.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01163, ecapa_loss=0.0002251, whisper_loss=0.09373, over 3886227.58 frames. ], batch size: 80, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:44:57,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=760750.0, ans=0.125 2024-08-10 21:45:14,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=760850.0, ans=0.1 2024-08-10 21:45:21,740 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=18.39 vs. limit=15.0 2024-08-10 21:45:23,622 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.660e+01 3.011e+01 3.359e+01 4.667e+01, threshold=6.021e+01, percent-clipped=0.0 2024-08-10 21:45:25,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=12.0 2024-08-10 21:45:43,627 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3650, loss[loss=0.09226, beats_loss=0.01045, ecapa_loss=0.0002301, whisper_loss=0.07951, over 15186.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01158, ecapa_loss=0.0002258, whisper_loss=0.09356, over 3852591.80 frames. ], batch size: 60, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:45:47,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=761150.0, ans=0.125 2024-08-10 21:45:49,731 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 21:46:04,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=761250.0, ans=0.125 2024-08-10 21:46:16,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=761350.0, ans=0.2 2024-08-10 21:46:32,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=761450.0, ans=0.125 2024-08-10 21:46:46,480 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 21:46:48,727 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3700, loss[loss=0.107, beats_loss=0.01259, ecapa_loss=0.0001964, whisper_loss=0.09247, over 18231.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01158, ecapa_loss=0.0002274, whisper_loss=0.09392, over 3841363.66 frames. ], batch size: 70, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:47:00,818 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 21:47:25,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=761850.0, ans=10.0 2024-08-10 21:47:25,456 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.27 vs. limit=10.0 2024-08-10 21:47:34,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=761950.0, ans=0.0 2024-08-10 21:47:35,079 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.698e+01 3.015e+01 3.307e+01 5.689e+01, threshold=6.030e+01, percent-clipped=0.0 2024-08-10 21:47:52,070 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.69 vs. limit=22.5 2024-08-10 21:47:55,209 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3750, loss[loss=0.1119, beats_loss=0.01101, ecapa_loss=0.0001942, whisper_loss=0.09891, over 18852.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.0116, ecapa_loss=0.0002274, whisper_loss=0.09374, over 3843239.04 frames. ], batch size: 73, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:48:19,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=762250.0, ans=15.0 2024-08-10 21:48:27,766 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 21:48:31,629 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-10 21:48:36,982 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 21:48:43,089 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-10 21:48:48,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=762550.0, ans=0.0 2024-08-10 21:48:56,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=762550.0, ans=0.0 2024-08-10 21:49:01,263 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3800, loss[loss=0.1094, beats_loss=0.01118, ecapa_loss=0.0001823, whisper_loss=0.09638, over 14300.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01161, ecapa_loss=0.000227, whisper_loss=0.09379, over 3822015.59 frames. ], batch size: 54, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:49:15,827 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-10 21:49:22,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=762750.0, ans=0.95 2024-08-10 21:49:24,546 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.69 vs. limit=22.5 2024-08-10 21:49:29,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=762850.0, ans=0.125 2024-08-10 21:49:33,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=762850.0, ans=0.125 2024-08-10 21:49:45,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=762950.0, ans=0.125 2024-08-10 21:49:47,419 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.822e+01 3.123e+01 3.627e+01 5.849e+01, threshold=6.246e+01, percent-clipped=0.0 2024-08-10 21:49:50,196 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-10 21:50:04,760 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-10 21:50:07,006 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3850, loss[loss=0.1011, beats_loss=0.01097, ecapa_loss=0.0002188, whisper_loss=0.08798, over 16093.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01166, ecapa_loss=0.0002271, whisper_loss=0.09313, over 3829877.83 frames. ], batch size: 63, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:50:24,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=763250.0, ans=0.125 2024-08-10 21:50:32,217 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 21:50:38,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=763350.0, ans=0.0 2024-08-10 21:50:40,346 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-10 21:50:45,436 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2024-08-10 21:50:50,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=763450.0, ans=0.0 2024-08-10 21:50:50,520 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.94 vs. limit=22.5 2024-08-10 21:51:01,730 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 21:51:02,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=763550.0, ans=0.125 2024-08-10 21:51:10,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=763550.0, ans=0.125 2024-08-10 21:51:12,972 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3900, loss[loss=0.1331, beats_loss=0.01114, ecapa_loss=0.0002013, whisper_loss=0.12, over 23653.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01154, ecapa_loss=0.0002269, whisper_loss=0.09424, over 3807599.99 frames. ], batch size: 89, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:51:14,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=763650.0, ans=0.125 2024-08-10 21:51:15,648 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-10 21:51:32,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=763750.0, ans=15.0 2024-08-10 21:51:36,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=763750.0, ans=0.125 2024-08-10 21:51:42,314 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.80 vs. limit=22.5 2024-08-10 21:51:45,635 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 21:51:58,006 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 21:51:58,735 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.868e+01 3.190e+01 3.521e+01 6.195e+01, threshold=6.380e+01, percent-clipped=0.0 2024-08-10 21:52:00,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=763950.0, ans=0.125 2024-08-10 21:52:03,659 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.10 vs. limit=15.0 2024-08-10 21:52:17,741 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 3950, loss[loss=0.1279, beats_loss=0.01177, ecapa_loss=0.0002455, whisper_loss=0.1137, over 22135.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01158, ecapa_loss=0.000225, whisper_loss=0.09469, over 3827595.11 frames. ], batch size: 89, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:52:26,187 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.624e+05 2024-08-10 21:52:31,048 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-10 21:52:43,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=764350.0, ans=0.1 2024-08-10 21:52:53,552 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-10 21:52:56,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=764450.0, ans=0.125 2024-08-10 21:53:10,340 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.26 vs. limit=15.0 2024-08-10 21:53:12,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=764550.0, ans=0.0 2024-08-10 21:53:24,068 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4000, loss[loss=0.1195, beats_loss=0.01133, ecapa_loss=0.0001712, whisper_loss=0.1065, over 19268.00 frames. ], tot_loss[loss=0.109, beats_loss=0.0115, ecapa_loss=0.0002265, whisper_loss=0.09523, over 3882627.02 frames. ], batch size: 70, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:54:03,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=764950.0, ans=0.2 2024-08-10 21:54:09,617 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.764e+01 3.105e+01 3.573e+01 5.750e+01, threshold=6.210e+01, percent-clipped=0.0 2024-08-10 21:54:22,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=765050.0, ans=0.125 2024-08-10 21:54:28,625 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 21:54:29,650 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4050, loss[loss=0.08618, beats_loss=0.01146, ecapa_loss=0.0001995, whisper_loss=0.07272, over 15636.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01147, ecapa_loss=0.0002275, whisper_loss=0.0955, over 3903117.39 frames. ], batch size: 59, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:54:41,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=765250.0, ans=0.0 2024-08-10 21:54:50,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=765250.0, ans=0.0 2024-08-10 21:55:07,015 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 21:55:26,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=765550.0, ans=0.125 2024-08-10 21:55:28,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=765550.0, ans=0.2 2024-08-10 21:55:28,741 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.44 vs. limit=22.5 2024-08-10 21:55:34,521 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4100, loss[loss=0.1009, beats_loss=0.0131, ecapa_loss=0.0001948, whisper_loss=0.08583, over 22210.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01147, ecapa_loss=0.0002251, whisper_loss=0.09557, over 3905186.20 frames. ], batch size: 90, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:55:37,354 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-10 21:55:41,289 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 21:55:46,553 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 21:55:49,161 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 21:55:52,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=765750.0, ans=0.0 2024-08-10 21:55:56,537 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.41 vs. limit=22.5 2024-08-10 21:56:11,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=765850.0, ans=0.015 2024-08-10 21:56:13,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=765950.0, ans=0.125 2024-08-10 21:56:20,900 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.119e+01 2.758e+01 3.048e+01 3.457e+01 5.910e+01, threshold=6.096e+01, percent-clipped=0.0 2024-08-10 21:56:40,359 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.63 vs. limit=15.0 2024-08-10 21:56:40,718 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4150, loss[loss=0.1014, beats_loss=0.01105, ecapa_loss=0.0002036, whisper_loss=0.08834, over 17069.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01153, ecapa_loss=0.0002255, whisper_loss=0.09496, over 3885955.01 frames. ], batch size: 63, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:56:42,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=766150.0, ans=0.125 2024-08-10 21:56:42,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=766150.0, ans=0.125 2024-08-10 21:56:48,058 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.29 vs. limit=15.0 2024-08-10 21:56:52,306 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 21:56:56,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=766250.0, ans=0.09899494936611666 2024-08-10 21:57:08,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=766350.0, ans=0.0 2024-08-10 21:57:26,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=766450.0, ans=0.09899494936611666 2024-08-10 21:57:46,005 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4200, loss[loss=0.1091, beats_loss=0.01195, ecapa_loss=0.0002146, whisper_loss=0.09496, over 19799.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01145, ecapa_loss=0.000225, whisper_loss=0.09555, over 3881597.73 frames. ], batch size: 79, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:57:49,645 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.20 vs. limit=6.0 2024-08-10 21:57:57,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=766650.0, ans=0.2 2024-08-10 21:58:29,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=766950.0, ans=0.0 2024-08-10 21:58:31,823 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.720e+01 3.062e+01 3.636e+01 5.115e+01, threshold=6.123e+01, percent-clipped=0.0 2024-08-10 21:58:51,351 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4250, loss[loss=0.1231, beats_loss=0.009826, ecapa_loss=0.0002828, whisper_loss=0.1104, over 22033.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01143, ecapa_loss=0.0002249, whisper_loss=0.09551, over 3880051.90 frames. ], batch size: 91, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:59:02,541 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 21:59:09,206 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-10 21:59:10,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=767250.0, ans=0.1 2024-08-10 21:59:12,480 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2024-08-10 21:59:18,159 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-10 21:59:18,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=767350.0, ans=0.125 2024-08-10 21:59:19,138 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.53 vs. limit=6.0 2024-08-10 21:59:32,360 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 21:59:57,285 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4300, loss[loss=0.1029, beats_loss=0.01255, ecapa_loss=0.0002358, whisper_loss=0.08804, over 20943.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01144, ecapa_loss=0.0002235, whisper_loss=0.09558, over 3877909.24 frames. ], batch size: 86, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:00:03,567 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.93 vs. limit=15.0 2024-08-10 22:00:13,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=767750.0, ans=0.125 2024-08-10 22:00:34,399 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 9 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 22:00:43,523 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.655e+01 2.968e+01 3.386e+01 7.323e+01, threshold=5.937e+01, percent-clipped=2.0 2024-08-10 22:00:59,788 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 11 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 22:01:00,548 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-08-10 22:01:03,653 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4350, loss[loss=0.08654, beats_loss=0.01499, ecapa_loss=0.0002043, whisper_loss=0.0695, over 14117.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01151, ecapa_loss=0.0002239, whisper_loss=0.09529, over 3874418.96 frames. ], batch size: 57, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:01:08,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=768150.0, ans=0.125 2024-08-10 22:01:11,221 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.34 vs. limit=15.0 2024-08-10 22:01:24,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=768250.0, ans=0.125 2024-08-10 22:01:41,037 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-08-10 22:02:01,434 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-10 22:02:01,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=768550.0, ans=0.125 2024-08-10 22:02:05,231 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 22:02:08,805 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4400, loss[loss=0.1069, beats_loss=0.009725, ecapa_loss=0.0001995, whisper_loss=0.09515, over 16361.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01147, ecapa_loss=0.0002237, whisper_loss=0.09507, over 3843587.56 frames. ], batch size: 62, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:02:33,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=768850.0, ans=0.125 2024-08-10 22:02:36,520 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 22:02:36,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=768850.0, ans=0.125 2024-08-10 22:02:39,942 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-08-10 22:02:46,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=768850.0, ans=0.125 2024-08-10 22:02:55,234 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+01 2.875e+01 3.279e+01 3.849e+01 6.433e+01, threshold=6.559e+01, percent-clipped=3.0 2024-08-10 22:02:58,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=768950.0, ans=0.0 2024-08-10 22:03:14,730 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4450, loss[loss=0.0959, beats_loss=0.01484, ecapa_loss=0.0002496, whisper_loss=0.07857, over 17689.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01153, ecapa_loss=0.0002237, whisper_loss=0.09466, over 3850482.36 frames. ], batch size: 77, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:03:16,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=769150.0, ans=0.2 2024-08-10 22:03:19,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=769150.0, ans=0.125 2024-08-10 22:03:23,126 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 22:03:47,727 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 22:03:49,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=769350.0, ans=0.125 2024-08-10 22:03:53,506 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.34 vs. limit=12.0 2024-08-10 22:03:53,555 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.29 vs. limit=22.5 2024-08-10 22:03:55,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=769450.0, ans=0.125 2024-08-10 22:03:56,880 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 22:03:59,383 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 22:04:00,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=769450.0, ans=0.2 2024-08-10 22:04:10,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.61 vs. limit=5.0 2024-08-10 22:04:12,251 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-10 22:04:12,513 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 22:04:20,300 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4500, loss[loss=0.1154, beats_loss=0.01102, ecapa_loss=0.0002169, whisper_loss=0.1022, over 22464.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01151, ecapa_loss=0.0002239, whisper_loss=0.09512, over 3876636.93 frames. ], batch size: 89, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:04:20,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=769650.0, ans=0.125 2024-08-10 22:04:30,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=769650.0, ans=0.125 2024-08-10 22:04:36,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=769750.0, ans=0.125 2024-08-10 22:04:43,718 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 22:04:45,653 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=12.0 2024-08-10 22:04:46,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=769850.0, ans=0.1 2024-08-10 22:04:54,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=769850.0, ans=0.0 2024-08-10 22:04:59,424 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 22:05:03,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=769950.0, ans=0.0 2024-08-10 22:05:05,595 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.643e+01 3.102e+01 3.659e+01 7.014e+01, threshold=6.204e+01, percent-clipped=1.0 2024-08-10 22:05:06,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=769950.0, ans=0.125 2024-08-10 22:05:07,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=769950.0, ans=0.125 2024-08-10 22:05:09,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=769950.0, ans=0.0 2024-08-10 22:05:12,029 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 22:05:14,716 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 22:05:14,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=770050.0, ans=0.125 2024-08-10 22:05:22,661 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 22:05:25,053 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4550, loss[loss=0.0918, beats_loss=0.01134, ecapa_loss=0.0002705, whisper_loss=0.07775, over 17909.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01157, ecapa_loss=0.000224, whisper_loss=0.09463, over 3894090.18 frames. ], batch size: 72, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:05:31,775 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 22:05:33,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=770150.0, ans=0.125 2024-08-10 22:05:33,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=770150.0, ans=0.0 2024-08-10 22:05:51,121 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 22:06:26,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=770550.0, ans=0.0 2024-08-10 22:06:30,325 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4600, loss[loss=0.09926, beats_loss=0.01185, ecapa_loss=0.0001913, whisper_loss=0.08549, over 16915.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01168, ecapa_loss=0.0002218, whisper_loss=0.09442, over 3892131.36 frames. ], batch size: 64, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:06:32,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=770650.0, ans=0.1 2024-08-10 22:06:32,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=770650.0, ans=0.0 2024-08-10 22:06:40,000 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 22:07:16,734 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.935e+01 3.290e+01 3.824e+01 6.429e+01, threshold=6.581e+01, percent-clipped=1.0 2024-08-10 22:07:21,140 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 22:07:36,689 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4650, loss[loss=0.1004, beats_loss=0.01375, ecapa_loss=0.0001829, whisper_loss=0.08479, over 15913.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01176, ecapa_loss=0.0002232, whisper_loss=0.09319, over 3872478.80 frames. ], batch size: 63, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:07:42,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=771150.0, ans=0.2 2024-08-10 22:08:05,811 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 22:08:13,756 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-10 22:08:20,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=771450.0, ans=0.1 2024-08-10 22:08:31,060 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-10 22:08:43,026 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4700, loss[loss=0.1028, beats_loss=0.00953, ecapa_loss=0.0001787, whisper_loss=0.09151, over 16686.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01174, ecapa_loss=0.0002226, whisper_loss=0.09376, over 3875130.80 frames. ], batch size: 60, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:08:47,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=771650.0, ans=0.125 2024-08-10 22:09:14,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=771850.0, ans=0.125 2024-08-10 22:09:21,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=771850.0, ans=0.0 2024-08-10 22:09:29,776 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 2.713e+01 3.049e+01 3.532e+01 5.514e+01, threshold=6.097e+01, percent-clipped=0.0 2024-08-10 22:09:31,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=771950.0, ans=0.125 2024-08-10 22:09:32,364 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 22:09:43,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=772050.0, ans=0.1 2024-08-10 22:09:46,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=772050.0, ans=0.0 2024-08-10 22:09:49,174 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4750, loss[loss=0.1268, beats_loss=0.009872, ecapa_loss=0.0002212, whisper_loss=0.1147, over 17332.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01174, ecapa_loss=0.0002227, whisper_loss=0.09373, over 3873226.71 frames. ], batch size: 66, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:09:55,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=772150.0, ans=0.125 2024-08-10 22:09:55,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=772150.0, ans=0.2 2024-08-10 22:10:04,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=772250.0, ans=0.125 2024-08-10 22:10:08,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=772250.0, ans=0.2 2024-08-10 22:10:09,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=772250.0, ans=0.125 2024-08-10 22:10:24,358 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-10 22:10:27,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=772350.0, ans=0.0 2024-08-10 22:10:33,812 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.01 vs. limit=12.0 2024-08-10 22:10:38,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=772450.0, ans=0.09899494936611666 2024-08-10 22:10:55,387 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4800, loss[loss=0.1395, beats_loss=0.009463, ecapa_loss=0.0002294, whisper_loss=0.1277, over 16412.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.0118, ecapa_loss=0.0002225, whisper_loss=0.09456, over 3914936.51 frames. ], batch size: 64, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:10:59,820 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-10 22:11:01,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=772650.0, ans=0.05 2024-08-10 22:11:05,308 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.083e-01 2024-08-10 22:11:05,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=772650.0, ans=0.1 2024-08-10 22:11:12,457 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 22:11:33,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=772850.0, ans=0.0 2024-08-10 22:11:35,307 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 22:11:40,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=772950.0, ans=0.125 2024-08-10 22:11:41,624 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.673e+01 3.089e+01 3.492e+01 5.456e+01, threshold=6.177e+01, percent-clipped=0.0 2024-08-10 22:11:43,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=772950.0, ans=0.0 2024-08-10 22:11:49,693 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2024-08-10 22:11:59,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=773150.0, ans=0.0 2024-08-10 22:12:00,836 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4850, loss[loss=0.1022, beats_loss=0.0138, ecapa_loss=0.0002007, whisper_loss=0.08643, over 14529.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01185, ecapa_loss=0.0002221, whisper_loss=0.09402, over 3942162.95 frames. ], batch size: 58, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:12:02,385 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 22:12:41,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=773450.0, ans=0.0 2024-08-10 22:13:02,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=773550.0, ans=0.125 2024-08-10 22:13:06,799 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4900, loss[loss=0.1106, beats_loss=0.0127, ecapa_loss=0.0002276, whisper_loss=0.09561, over 17731.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01175, ecapa_loss=0.0002241, whisper_loss=0.09475, over 3923134.95 frames. ], batch size: 70, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:13:07,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=773650.0, ans=0.0 2024-08-10 22:13:08,294 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 22:13:08,917 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.86 vs. limit=15.0 2024-08-10 22:13:24,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=773750.0, ans=0.125 2024-08-10 22:13:29,647 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 22:13:35,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=773850.0, ans=0.0 2024-08-10 22:13:45,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=773950.0, ans=0.125 2024-08-10 22:13:47,260 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=12.0 2024-08-10 22:13:53,183 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.333e+01 2.882e+01 3.230e+01 4.059e+01 7.454e+01, threshold=6.460e+01, percent-clipped=3.0 2024-08-10 22:14:02,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=774050.0, ans=0.0 2024-08-10 22:14:12,321 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 4950, loss[loss=0.1056, beats_loss=0.01087, ecapa_loss=0.0002372, whisper_loss=0.09234, over 14705.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01181, ecapa_loss=0.0002247, whisper_loss=0.09409, over 3912177.37 frames. ], batch size: 56, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:14:21,698 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 27 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-10 22:14:22,370 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=22.5 2024-08-10 22:14:31,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=774250.0, ans=0.1 2024-08-10 22:14:35,538 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=12.0 2024-08-10 22:14:57,389 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 22:15:06,420 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 22:15:18,638 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5000, loss[loss=0.127, beats_loss=0.01044, ecapa_loss=0.0002144, whisper_loss=0.1144, over 22924.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01171, ecapa_loss=0.0002251, whisper_loss=0.09504, over 3895052.73 frames. ], batch size: 90, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:15:26,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=774650.0, ans=0.125 2024-08-10 22:15:45,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=774850.0, ans=0.1 2024-08-10 22:16:07,422 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.646e+01 2.904e+01 3.171e+01 4.689e+01, threshold=5.808e+01, percent-clipped=0.0 2024-08-10 22:16:14,435 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.79 vs. limit=15.0 2024-08-10 22:16:30,409 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5050, loss[loss=0.1061, beats_loss=0.01317, ecapa_loss=0.0002087, whisper_loss=0.09083, over 21756.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01171, ecapa_loss=0.0002249, whisper_loss=0.09544, over 3890964.49 frames. ], batch size: 90, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:16:32,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=775150.0, ans=0.125 2024-08-10 22:17:02,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=775350.0, ans=0.1 2024-08-10 22:17:11,838 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=12.0 2024-08-10 22:17:14,053 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 22:17:18,546 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 22:17:20,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=775450.0, ans=0.125 2024-08-10 22:17:34,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=775550.0, ans=0.125 2024-08-10 22:17:47,102 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5100, loss[loss=0.1087, beats_loss=0.009576, ecapa_loss=0.0002257, whisper_loss=0.09684, over 22242.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01176, ecapa_loss=0.0002233, whisper_loss=0.09436, over 3906149.97 frames. ], batch size: 91, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:18:25,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=775850.0, ans=0.0 2024-08-10 22:18:51,528 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.259e+01 2.763e+01 3.180e+01 3.560e+01 6.035e+01, threshold=6.359e+01, percent-clipped=1.0 2024-08-10 22:19:13,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=776050.0, ans=0.2 2024-08-10 22:19:16,641 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5150, loss[loss=0.1296, beats_loss=0.009288, ecapa_loss=0.0002141, whisper_loss=0.1182, over 14900.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01171, ecapa_loss=0.0002205, whisper_loss=0.09508, over 3908911.81 frames. ], batch size: 57, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:19:19,814 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 22:19:39,668 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-10 22:19:44,176 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2024-08-10 22:19:49,531 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 22:20:02,207 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 22:20:18,149 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.30 vs. limit=15.0 2024-08-10 22:20:24,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=776450.0, ans=0.125 2024-08-10 22:20:49,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=776550.0, ans=12.0 2024-08-10 22:21:03,144 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5200, loss[loss=0.1055, beats_loss=0.01056, ecapa_loss=0.000217, whisper_loss=0.09272, over 21771.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01162, ecapa_loss=0.0002225, whisper_loss=0.09486, over 3886053.72 frames. ], batch size: 86, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:21:29,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=776750.0, ans=0.125 2024-08-10 22:21:38,674 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 22:22:07,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=776950.0, ans=0.0 2024-08-10 22:22:11,478 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.788e+01 3.076e+01 3.692e+01 5.822e+01, threshold=6.152e+01, percent-clipped=0.0 2024-08-10 22:22:19,890 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-10 22:22:24,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=777050.0, ans=0.0 2024-08-10 22:22:32,730 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 22:22:42,614 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5250, loss[loss=0.1089, beats_loss=0.0119, ecapa_loss=0.0002175, whisper_loss=0.09479, over 14455.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01162, ecapa_loss=0.0002234, whisper_loss=0.09427, over 3841296.40 frames. ], batch size: 58, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:23:06,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=777250.0, ans=0.125 2024-08-10 22:23:09,325 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-10 22:23:13,856 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 22:23:31,822 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 32 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 22:23:45,578 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 22:24:01,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=777450.0, ans=0.1 2024-08-10 22:24:08,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=777450.0, ans=0.5 2024-08-10 22:24:09,330 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.21 vs. limit=15.0 2024-08-10 22:24:23,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=777550.0, ans=0.125 2024-08-10 22:24:38,055 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5300, loss[loss=0.1026, beats_loss=0.01317, ecapa_loss=0.0002336, whisper_loss=0.08705, over 23230.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01158, ecapa_loss=0.0002248, whisper_loss=0.09451, over 3856508.52 frames. ], batch size: 96, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:24:44,644 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 39 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 22:25:31,341 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 22:25:35,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=777850.0, ans=0.125 2024-08-10 22:25:44,419 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 22:25:51,638 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=12.0 2024-08-10 22:26:02,637 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.845e+01 3.130e+01 3.652e+01 5.218e+01, threshold=6.259e+01, percent-clipped=0.0 2024-08-10 22:26:28,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=778050.0, ans=0.2 2024-08-10 22:26:38,976 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5350, loss[loss=0.1215, beats_loss=0.01112, ecapa_loss=0.0002352, whisper_loss=0.1081, over 22156.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01163, ecapa_loss=0.000224, whisper_loss=0.09434, over 3868206.52 frames. ], batch size: 90, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:26:47,739 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.67 vs. limit=15.0 2024-08-10 22:27:19,295 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.50 vs. limit=15.0 2024-08-10 22:28:05,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=778450.0, ans=0.1 2024-08-10 22:28:15,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=778550.0, ans=0.125 2024-08-10 22:28:18,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=778550.0, ans=0.125 2024-08-10 22:28:23,154 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 31 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 22:28:27,024 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.50 vs. limit=10.0 2024-08-10 22:28:27,619 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5400, loss[loss=0.1197, beats_loss=0.01096, ecapa_loss=0.0002314, whisper_loss=0.1064, over 22189.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01162, ecapa_loss=0.0002223, whisper_loss=0.09443, over 3869822.59 frames. ], batch size: 88, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:28:34,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=778650.0, ans=0.95 2024-08-10 22:28:40,904 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 22:28:55,777 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-10 22:28:57,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=778750.0, ans=0.1 2024-08-10 22:29:14,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=778950.0, ans=0.2 2024-08-10 22:29:18,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=778950.0, ans=0.125 2024-08-10 22:29:24,780 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.711e+01 3.100e+01 3.573e+01 5.377e+01, threshold=6.200e+01, percent-clipped=0.0 2024-08-10 22:29:29,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=778950.0, ans=0.125 2024-08-10 22:29:40,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=779050.0, ans=0.125 2024-08-10 22:29:50,140 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-08-10 22:29:50,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5450, loss[loss=0.08141, beats_loss=0.0116, ecapa_loss=0.0002264, whisper_loss=0.06755, over 14704.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01164, ecapa_loss=0.0002218, whisper_loss=0.09402, over 3839106.55 frames. ], batch size: 58, lr: 1.05e-02, grad_scale: 8796093022208.0 2024-08-10 22:29:57,254 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 22:30:11,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=779250.0, ans=0.2 2024-08-10 22:30:12,135 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.08 vs. limit=15.0 2024-08-10 22:30:13,487 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 37 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 22:30:18,661 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 22:30:23,985 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.62 vs. limit=22.5 2024-08-10 22:30:49,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=779450.0, ans=0.0 2024-08-10 22:31:06,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=779550.0, ans=0.125 2024-08-10 22:31:06,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=779550.0, ans=0.1 2024-08-10 22:31:17,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=779550.0, ans=0.125 2024-08-10 22:31:23,647 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5500, loss[loss=0.1148, beats_loss=0.01077, ecapa_loss=0.000247, whisper_loss=0.1016, over 21726.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01159, ecapa_loss=0.0002211, whisper_loss=0.09472, over 3851752.76 frames. ], batch size: 90, lr: 1.05e-02, grad_scale: 8796093022208.0 2024-08-10 22:31:24,223 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=9.476e-01 2024-08-10 22:31:25,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=779650.0, ans=0.025 2024-08-10 22:31:30,755 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 22:31:49,813 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-10 22:31:57,424 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-10 22:32:08,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=779850.0, ans=0.1 2024-08-10 22:32:21,285 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 22:32:27,115 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-10 22:32:29,653 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.640e+01 3.152e+01 3.892e+01 6.209e+01, threshold=6.304e+01, percent-clipped=1.0 2024-08-10 22:32:32,026 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 22:32:32,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=779950.0, ans=0.1 2024-08-10 22:32:45,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=780050.0, ans=0.0 2024-08-10 22:32:52,679 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 22:32:54,633 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 22:32:58,529 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5550, loss[loss=0.09353, beats_loss=0.01253, ecapa_loss=0.0002751, whisper_loss=0.07824, over 15732.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01163, ecapa_loss=0.0002221, whisper_loss=0.09459, over 3894474.57 frames. ], batch size: 66, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:32:58,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=780150.0, ans=0.5 2024-08-10 22:33:29,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=780250.0, ans=0.1 2024-08-10 22:33:31,539 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-10 22:33:34,131 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-08-10 22:33:38,599 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.92 vs. limit=22.5 2024-08-10 22:33:50,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=780350.0, ans=0.125 2024-08-10 22:34:10,854 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.97 vs. limit=15.0 2024-08-10 22:34:25,948 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 22:34:33,145 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5600, loss[loss=0.1032, beats_loss=0.01166, ecapa_loss=0.0002879, whisper_loss=0.08861, over 17194.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.0116, ecapa_loss=0.0002215, whisper_loss=0.09449, over 3889965.80 frames. ], batch size: 74, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:34:35,922 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.78 vs. limit=15.0 2024-08-10 22:34:41,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=780650.0, ans=0.2 2024-08-10 22:34:52,775 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.76 vs. limit=12.0 2024-08-10 22:34:56,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=780750.0, ans=0.125 2024-08-10 22:35:02,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=780750.0, ans=0.125 2024-08-10 22:35:15,128 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 22:35:19,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=780850.0, ans=0.0 2024-08-10 22:35:21,789 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 22:35:26,263 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.93 vs. limit=22.5 2024-08-10 22:35:29,522 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 22:35:34,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=780950.0, ans=0.0 2024-08-10 22:35:36,149 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+01 2.826e+01 3.158e+01 3.731e+01 5.525e+01, threshold=6.316e+01, percent-clipped=0.0 2024-08-10 22:35:38,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=780950.0, ans=0.05 2024-08-10 22:36:00,906 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-10 22:36:04,124 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5650, loss[loss=0.1151, beats_loss=0.01235, ecapa_loss=0.0002446, whisper_loss=0.1003, over 22006.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01159, ecapa_loss=0.0002221, whisper_loss=0.09435, over 3909318.14 frames. ], batch size: 88, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:36:34,614 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 22:36:49,152 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-10 22:37:12,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=781450.0, ans=0.025 2024-08-10 22:37:24,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=781550.0, ans=0.125 2024-08-10 22:37:25,244 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.17 vs. limit=6.0 2024-08-10 22:37:35,097 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5700, loss[loss=0.09285, beats_loss=0.01551, ecapa_loss=0.0002135, whisper_loss=0.0752, over 15366.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01165, ecapa_loss=0.0002225, whisper_loss=0.09381, over 3897424.73 frames. ], batch size: 62, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:38:00,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=781750.0, ans=0.0 2024-08-10 22:38:29,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=781950.0, ans=0.125 2024-08-10 22:38:31,470 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 22:38:38,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=781950.0, ans=0.0 2024-08-10 22:38:40,026 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.325e+01 2.917e+01 3.187e+01 3.836e+01 6.311e+01, threshold=6.373e+01, percent-clipped=0.0 2024-08-10 22:38:56,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=782050.0, ans=0.125 2024-08-10 22:38:59,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=782050.0, ans=0.125 2024-08-10 22:39:06,578 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5750, loss[loss=0.1143, beats_loss=0.01041, ecapa_loss=0.0001926, whisper_loss=0.102, over 18259.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01162, ecapa_loss=0.0002224, whisper_loss=0.09414, over 3895619.87 frames. ], batch size: 70, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:39:07,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=782150.0, ans=0.0 2024-08-10 22:39:08,621 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-10 22:39:13,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=782150.0, ans=0.0 2024-08-10 22:39:14,940 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-10 22:39:29,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=782250.0, ans=0.1 2024-08-10 22:39:32,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=782250.0, ans=0.125 2024-08-10 22:39:55,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=782350.0, ans=0.125 2024-08-10 22:39:55,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=782350.0, ans=0.125 2024-08-10 22:40:00,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=782450.0, ans=0.125 2024-08-10 22:40:33,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=782550.0, ans=0.0 2024-08-10 22:40:39,543 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5800, loss[loss=0.1006, beats_loss=0.01244, ecapa_loss=0.0002624, whisper_loss=0.08552, over 21224.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01158, ecapa_loss=0.0002235, whisper_loss=0.09367, over 3854305.96 frames. ], batch size: 92, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:41:02,527 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=15.0 2024-08-10 22:41:21,500 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 22:41:29,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=782850.0, ans=0.2 2024-08-10 22:41:32,523 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-10 22:41:38,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=782950.0, ans=0.125 2024-08-10 22:41:38,893 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.30 vs. limit=15.0 2024-08-10 22:41:44,317 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.672e+01 3.034e+01 3.531e+01 4.962e+01, threshold=6.068e+01, percent-clipped=0.0 2024-08-10 22:41:51,226 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 22:42:00,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=783050.0, ans=0.125 2024-08-10 22:42:12,204 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5850, loss[loss=0.09401, beats_loss=0.01396, ecapa_loss=0.0001904, whisper_loss=0.07814, over 22373.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.0116, ecapa_loss=0.0002227, whisper_loss=0.09369, over 3845027.35 frames. ], batch size: 90, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:42:17,871 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.20 vs. limit=12.0 2024-08-10 22:42:18,997 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 22:42:34,975 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-10 22:43:28,085 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2024-08-10 22:43:29,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=783550.0, ans=0.0 2024-08-10 22:43:29,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=783550.0, ans=0.05 2024-08-10 22:43:29,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=783550.0, ans=0.1 2024-08-10 22:43:30,720 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-10 22:43:31,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=783550.0, ans=0.2 2024-08-10 22:43:41,771 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5900, loss[loss=0.08992, beats_loss=0.01444, ecapa_loss=0.0002013, whisper_loss=0.07347, over 17908.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01158, ecapa_loss=0.0002225, whisper_loss=0.09424, over 3838902.10 frames. ], batch size: 73, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:43:42,054 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 22:43:59,530 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-10 22:44:19,472 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 22:44:19,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=783850.0, ans=0.125 2024-08-10 22:44:47,398 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.739e+01 3.064e+01 3.610e+01 4.850e+01, threshold=6.128e+01, percent-clipped=0.0 2024-08-10 22:44:51,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=783950.0, ans=0.125 2024-08-10 22:44:58,291 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.81 vs. limit=22.5 2024-08-10 22:44:58,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2024-08-10 22:45:15,223 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 5950, loss[loss=0.09705, beats_loss=0.01295, ecapa_loss=0.0002814, whisper_loss=0.08129, over 19324.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01166, ecapa_loss=0.0002217, whisper_loss=0.09394, over 3845808.26 frames. ], batch size: 81, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:45:34,817 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 22:45:38,056 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 19 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 22:46:18,221 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 22:46:46,112 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6000, loss[loss=0.1398, beats_loss=0.008551, ecapa_loss=0.000211, whisper_loss=0.1291, over 15867.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01163, ecapa_loss=0.0002217, whisper_loss=0.09412, over 3814259.63 frames. ], batch size: 59, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:46:46,113 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-10 22:47:10,409 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.2985, 3.1195, 2.7542, 2.6928], device='cuda:2') 2024-08-10 22:47:25,473 INFO [train_multi_KD3.py:1149] (2/4) Epoch 6, validation on ASR_libri: loss=0.2592, beats_loss=0, ecapa_loss=0.0006893, whisper_loss=0.2523, over 922467.00 frames. 2024-08-10 22:47:44,049 INFO [train_multi_KD3.py:1149] (2/4) Epoch 6, validation on SV_voxceleb1: loss=0.005715, beats_loss=0, ecapa_loss=0.0005715, whisper_loss=0, over 939242.00 frames. 2024-08-10 22:48:39,718 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.7385, 2.2481, 1.6953, 1.1908], device='cuda:2') 2024-08-10 22:49:35,381 INFO [train_multi_KD3.py:1149] (2/4) Epoch 6, validation on AT_audioset: loss=0.02616, beats_loss=0.02616, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 22:49:35,385 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-10 22:49:43,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=784650.0, ans=0.125 2024-08-10 22:50:20,919 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 22:50:21,428 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.32 vs. limit=10.0 2024-08-10 22:50:30,618 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 22:50:32,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=784950.0, ans=0.0 2024-08-10 22:50:33,548 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.532e+01 2.986e+01 3.661e+01 5.128e+01, threshold=5.971e+01, percent-clipped=0.0 2024-08-10 22:50:53,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=785050.0, ans=0.1 2024-08-10 22:50:54,969 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 22:50:55,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=785050.0, ans=0.0 2024-08-10 22:51:00,032 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6050, loss[loss=0.1087, beats_loss=0.0103, ecapa_loss=0.0002693, whisper_loss=0.09568, over 15088.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01151, ecapa_loss=0.0002208, whisper_loss=0.09448, over 3798848.98 frames. ], batch size: 60, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:51:12,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=785150.0, ans=0.125 2024-08-10 22:51:28,150 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 22:51:39,557 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 22:52:12,007 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 22:52:16,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=785550.0, ans=0.125 2024-08-10 22:52:35,309 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2024-08-10 22:52:35,586 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6100, loss[loss=0.1136, beats_loss=0.01041, ecapa_loss=0.0002002, whisper_loss=0.1011, over 20088.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01165, ecapa_loss=0.00022, whisper_loss=0.09361, over 3819766.83 frames. ], batch size: 75, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:52:43,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=785650.0, ans=0.0 2024-08-10 22:52:50,415 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 22:52:50,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=785650.0, ans=0.0 2024-08-10 22:52:58,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=785750.0, ans=0.125 2024-08-10 22:52:59,397 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-10 22:53:09,394 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-10 22:53:15,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=785850.0, ans=0.125 2024-08-10 22:53:17,737 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-10 22:53:25,607 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.37 vs. limit=10.0 2024-08-10 22:53:36,807 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.092e+01 2.867e+01 3.222e+01 3.705e+01 5.709e+01, threshold=6.445e+01, percent-clipped=0.0 2024-08-10 22:53:52,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=786050.0, ans=0.09899494936611666 2024-08-10 22:54:05,743 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6150, loss[loss=0.09184, beats_loss=0.01442, ecapa_loss=0.0002147, whisper_loss=0.07527, over 21323.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01169, ecapa_loss=0.0002217, whisper_loss=0.09414, over 3840382.44 frames. ], batch size: 89, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:54:41,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=786350.0, ans=0.0 2024-08-10 22:55:01,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2024-08-10 22:55:13,044 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.133e-02 2024-08-10 22:55:32,588 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6200, loss[loss=0.09433, beats_loss=0.01322, ecapa_loss=0.0002208, whisper_loss=0.0789, over 21874.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01167, ecapa_loss=0.0002212, whisper_loss=0.09462, over 3860204.81 frames. ], batch size: 94, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:55:47,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=786750.0, ans=0.1 2024-08-10 22:55:51,222 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-10 22:56:23,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=786950.0, ans=0.1 2024-08-10 22:56:31,729 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.713e+01 3.021e+01 3.323e+01 5.362e+01, threshold=6.041e+01, percent-clipped=0.0 2024-08-10 22:56:36,632 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 22:56:38,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=787050.0, ans=0.04949747468305833 2024-08-10 22:56:57,149 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6250, loss[loss=0.09987, beats_loss=0.0153, ecapa_loss=0.0001685, whisper_loss=0.08288, over 22904.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01177, ecapa_loss=0.0002202, whisper_loss=0.09372, over 3850854.59 frames. ], batch size: 88, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:57:07,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=787150.0, ans=0.125 2024-08-10 22:57:24,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=787250.0, ans=0.0 2024-08-10 22:57:24,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=787250.0, ans=0.0 2024-08-10 22:57:29,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=787350.0, ans=0.2 2024-08-10 22:57:52,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=787450.0, ans=0.0 2024-08-10 22:58:12,101 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 22:58:12,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=787550.0, ans=0.125 2024-08-10 22:58:17,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=787550.0, ans=0.125 2024-08-10 22:58:20,971 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6300, loss[loss=0.1005, beats_loss=0.01126, ecapa_loss=0.0002467, whisper_loss=0.08681, over 17242.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01175, ecapa_loss=0.000221, whisper_loss=0.09418, over 3856091.19 frames. ], batch size: 70, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:58:36,738 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-10 22:58:44,180 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.52 vs. limit=10.0 2024-08-10 22:58:52,203 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 22:58:54,438 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 22:59:17,385 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.690e-01 2024-08-10 22:59:19,937 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 2.845e+01 3.065e+01 3.583e+01 5.394e+01, threshold=6.129e+01, percent-clipped=0.0 2024-08-10 22:59:23,642 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 22:59:34,013 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.27 vs. limit=15.0 2024-08-10 22:59:35,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=788050.0, ans=0.125 2024-08-10 22:59:41,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=788050.0, ans=0.025 2024-08-10 22:59:44,787 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6350, loss[loss=0.1132, beats_loss=0.01068, ecapa_loss=0.0002461, whisper_loss=0.1, over 22193.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01165, ecapa_loss=0.0002223, whisper_loss=0.09432, over 3848784.46 frames. ], batch size: 90, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:59:55,426 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.79 vs. limit=6.0 2024-08-10 22:59:56,805 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2024-08-10 23:00:00,894 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 23:00:24,346 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=22.5 2024-08-10 23:00:30,294 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.66 vs. limit=22.5 2024-08-10 23:00:40,384 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 24 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-10 23:00:44,051 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-10 23:00:47,359 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 23:00:50,539 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 23:01:09,152 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6400, loss[loss=0.1117, beats_loss=0.01045, ecapa_loss=0.0002182, whisper_loss=0.09905, over 22509.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01161, ecapa_loss=0.0002234, whisper_loss=0.09498, over 3876718.73 frames. ], batch size: 89, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:01:32,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=788750.0, ans=0.125 2024-08-10 23:01:34,506 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2024-08-10 23:01:36,311 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=15.0 2024-08-10 23:01:39,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=788750.0, ans=0.025 2024-08-10 23:01:42,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=788850.0, ans=0.125 2024-08-10 23:01:59,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=788950.0, ans=0.125 2024-08-10 23:02:07,226 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.821e+01 3.135e+01 3.560e+01 4.755e+01, threshold=6.269e+01, percent-clipped=0.0 2024-08-10 23:02:12,853 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 31 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 23:02:23,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=789050.0, ans=0.1 2024-08-10 23:02:28,013 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 23:02:32,273 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6450, loss[loss=0.09696, beats_loss=0.01265, ecapa_loss=0.0001669, whisper_loss=0.08265, over 22645.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01151, ecapa_loss=0.0002248, whisper_loss=0.09552, over 3909844.54 frames. ], batch size: 88, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:02:37,354 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2024-08-10 23:02:39,281 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.83 vs. limit=22.5 2024-08-10 23:02:53,770 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 23:02:57,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=789250.0, ans=0.0 2024-08-10 23:03:18,443 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2024-08-10 23:03:35,666 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2024-08-10 23:03:44,942 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2024-08-10 23:03:49,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=789550.0, ans=0.0 2024-08-10 23:03:51,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=789550.0, ans=0.1 2024-08-10 23:03:54,213 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6500, loss[loss=0.09832, beats_loss=0.01357, ecapa_loss=0.0001748, whisper_loss=0.083, over 13856.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01157, ecapa_loss=0.0002241, whisper_loss=0.09483, over 3911460.81 frames. ], batch size: 55, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:04:16,252 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-10 23:04:29,213 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 26 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-10 23:04:55,219 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 2.899e+01 3.223e+01 3.887e+01 5.763e+01, threshold=6.447e+01, percent-clipped=0.0 2024-08-10 23:04:58,581 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 23:05:01,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=790050.0, ans=0.125 2024-08-10 23:05:19,103 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6550, loss[loss=0.1013, beats_loss=0.01305, ecapa_loss=0.0002249, whisper_loss=0.08603, over 21785.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01164, ecapa_loss=0.0002227, whisper_loss=0.09509, over 3939889.79 frames. ], batch size: 90, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:05:27,972 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 23:05:37,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=790250.0, ans=0.1 2024-08-10 23:05:44,774 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 23:05:47,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=790250.0, ans=0.125 2024-08-10 23:06:19,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=790450.0, ans=0.125 2024-08-10 23:06:26,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=790550.0, ans=0.2 2024-08-10 23:06:42,237 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6600, loss[loss=0.09157, beats_loss=0.01189, ecapa_loss=0.0002329, whisper_loss=0.07736, over 20967.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01161, ecapa_loss=0.0002237, whisper_loss=0.09528, over 3922280.69 frames. ], batch size: 89, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:07:00,877 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 19 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-10 23:07:20,202 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 23:07:22,974 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 23:07:33,631 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 23:07:33,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=790950.0, ans=0.05 2024-08-10 23:07:37,362 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.57 vs. limit=6.0 2024-08-10 23:07:38,347 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.957e+01 3.211e+01 3.827e+01 6.878e+01, threshold=6.422e+01, percent-clipped=2.0 2024-08-10 23:07:41,823 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 23:07:50,043 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-10 23:07:51,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=791050.0, ans=0.1 2024-08-10 23:08:02,369 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6650, loss[loss=0.1075, beats_loss=0.01117, ecapa_loss=0.0002603, whisper_loss=0.09375, over 22575.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01173, ecapa_loss=0.0002222, whisper_loss=0.09419, over 3939741.11 frames. ], batch size: 90, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:08:05,861 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 23:08:15,174 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 23:09:13,038 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 23:09:22,953 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6700, loss[loss=0.1072, beats_loss=0.0113, ecapa_loss=0.0002445, whisper_loss=0.09348, over 15692.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01171, ecapa_loss=0.0002212, whisper_loss=0.09457, over 3905004.67 frames. ], batch size: 63, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:09:24,363 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 23:09:36,059 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 23:09:48,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=791750.0, ans=0.125 2024-08-10 23:09:48,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=791750.0, ans=0.1 2024-08-10 23:10:04,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=791850.0, ans=0.0 2024-08-10 23:10:10,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=791950.0, ans=0.05 2024-08-10 23:10:14,065 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 23:10:15,176 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.716e+01 3.104e+01 3.606e+01 5.024e+01, threshold=6.207e+01, percent-clipped=0.0 2024-08-10 23:10:19,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=791950.0, ans=0.0 2024-08-10 23:10:22,219 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.16 vs. limit=15.0 2024-08-10 23:10:36,419 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 23:10:37,756 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6750, loss[loss=0.1109, beats_loss=0.01013, ecapa_loss=0.0002375, whisper_loss=0.0984, over 22277.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01165, ecapa_loss=0.0002223, whisper_loss=0.09463, over 3865544.60 frames. ], batch size: 92, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:10:53,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=792250.0, ans=0.125 2024-08-10 23:11:10,871 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-10 23:11:21,493 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.86 vs. limit=22.5 2024-08-10 23:11:38,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=792550.0, ans=0.2 2024-08-10 23:11:46,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=792550.0, ans=0.05 2024-08-10 23:11:50,777 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=15.0 2024-08-10 23:11:54,388 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6800, loss[loss=0.1028, beats_loss=0.01465, ecapa_loss=0.0001673, whisper_loss=0.08645, over 15130.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01165, ecapa_loss=0.0002223, whisper_loss=0.0943, over 3838718.00 frames. ], batch size: 60, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:12:13,043 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 23:12:37,946 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 23:12:41,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=792950.0, ans=0.125 2024-08-10 23:12:43,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=792950.0, ans=10.0 2024-08-10 23:12:46,817 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 2.823e+01 3.224e+01 3.746e+01 6.225e+01, threshold=6.449e+01, percent-clipped=1.0 2024-08-10 23:12:52,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=792950.0, ans=0.04949747468305833 2024-08-10 23:12:53,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=793050.0, ans=0.125 2024-08-10 23:12:54,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=793050.0, ans=0.1 2024-08-10 23:12:55,969 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 23:12:59,336 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.14 vs. limit=22.5 2024-08-10 23:13:10,776 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6850, loss[loss=0.1003, beats_loss=0.0116, ecapa_loss=0.0001938, whisper_loss=0.08676, over 20038.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01166, ecapa_loss=0.000222, whisper_loss=0.0941, over 3817125.41 frames. ], batch size: 78, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:13:19,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=793150.0, ans=0.1 2024-08-10 23:13:27,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=793250.0, ans=0.125 2024-08-10 23:13:29,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=793250.0, ans=0.0 2024-08-10 23:13:33,937 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 23:13:43,988 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-10 23:14:07,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=793450.0, ans=0.0 2024-08-10 23:14:09,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=793450.0, ans=0.0 2024-08-10 23:14:20,145 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-10 23:14:28,220 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6900, loss[loss=0.09687, beats_loss=0.01455, ecapa_loss=0.0002164, whisper_loss=0.08016, over 21894.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01172, ecapa_loss=0.0002201, whisper_loss=0.09348, over 3849466.96 frames. ], batch size: 88, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:14:36,144 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=15.0 2024-08-10 23:14:47,694 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 23:14:53,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=793750.0, ans=0.0 2024-08-10 23:15:14,214 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 23:15:20,530 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.839e+01 3.157e+01 3.612e+01 7.302e+01, threshold=6.314e+01, percent-clipped=1.0 2024-08-10 23:15:26,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=794050.0, ans=0.2 2024-08-10 23:15:28,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=794050.0, ans=0.1 2024-08-10 23:15:30,627 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 23:15:34,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=794050.0, ans=0.0 2024-08-10 23:15:37,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=794050.0, ans=0.125 2024-08-10 23:15:39,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=794050.0, ans=0.0 2024-08-10 23:15:41,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=794150.0, ans=0.0 2024-08-10 23:15:41,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=794150.0, ans=0.5 2024-08-10 23:15:42,728 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 6950, loss[loss=0.09347, beats_loss=0.0119, ecapa_loss=0.000211, whisper_loss=0.07945, over 20620.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01175, ecapa_loss=0.0002192, whisper_loss=0.09394, over 3881897.86 frames. ], batch size: 82, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:15:49,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=794150.0, ans=0.1 2024-08-10 23:15:58,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=794250.0, ans=0.5 2024-08-10 23:16:04,271 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-10 23:16:11,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=794350.0, ans=0.2 2024-08-10 23:16:24,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=794450.0, ans=0.125 2024-08-10 23:16:31,548 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 23:16:46,340 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.66 vs. limit=22.5 2024-08-10 23:16:57,098 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7000, loss[loss=0.0985, beats_loss=0.01381, ecapa_loss=0.0001394, whisper_loss=0.0833, over 19948.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01176, ecapa_loss=0.0002195, whisper_loss=0.09381, over 3890490.18 frames. ], batch size: 78, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:17:49,347 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.681e+01 2.983e+01 3.369e+01 6.385e+01, threshold=5.967e+01, percent-clipped=1.0 2024-08-10 23:17:56,410 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.62 vs. limit=22.5 2024-08-10 23:18:08,425 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.63 vs. limit=15.0 2024-08-10 23:18:10,050 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 23:18:12,520 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7050, loss[loss=0.1334, beats_loss=0.005683, ecapa_loss=0.0002551, whisper_loss=0.1252, over 16424.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01168, ecapa_loss=0.0002206, whisper_loss=0.09375, over 3846393.64 frames. ], batch size: 61, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:18:28,042 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-10 23:18:48,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795350.0, ans=0.1 2024-08-10 23:18:48,633 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.74 vs. limit=15.0 2024-08-10 23:18:55,228 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 28 from Vox, 16 fro AS 2024-08-10 23:19:14,080 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 23:19:28,817 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7100, loss[loss=0.08861, beats_loss=0.0115, ecapa_loss=0.0002374, whisper_loss=0.07474, over 13542.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01162, ecapa_loss=0.0002224, whisper_loss=0.09363, over 3829702.33 frames. ], batch size: 56, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:19:49,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=795750.0, ans=0.125 2024-08-10 23:19:57,769 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 23:20:14,653 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.989e+00 2024-08-10 23:20:18,056 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=12.0 2024-08-10 23:20:22,923 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.045e+01 2.587e+01 2.924e+01 3.368e+01 5.025e+01, threshold=5.848e+01, percent-clipped=0.0 2024-08-10 23:20:27,864 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-10 23:20:37,851 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 40 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-10 23:20:39,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=796050.0, ans=0.125 2024-08-10 23:20:46,500 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7150, loss[loss=0.1122, beats_loss=0.01097, ecapa_loss=0.0002729, whisper_loss=0.09852, over 20319.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01161, ecapa_loss=0.0002216, whisper_loss=0.09385, over 3874023.10 frames. ], batch size: 80, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:20:46,702 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 23:20:50,809 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 32 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 23:20:56,475 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 33 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 23:21:11,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=796250.0, ans=0.0 2024-08-10 23:21:19,229 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=4.022e-02 2024-08-10 23:21:34,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=796450.0, ans=0.125 2024-08-10 23:21:56,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=796550.0, ans=0.05 2024-08-10 23:21:57,501 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 23:21:58,997 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 23:22:00,081 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7200, loss[loss=0.1133, beats_loss=0.01207, ecapa_loss=0.0002176, whisper_loss=0.09905, over 18658.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01154, ecapa_loss=0.0002218, whisper_loss=0.09389, over 3863256.30 frames. ], batch size: 73, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:22:00,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=796650.0, ans=0.125 2024-08-10 23:22:14,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=796750.0, ans=0.0 2024-08-10 23:22:17,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=796750.0, ans=0.125 2024-08-10 23:22:19,924 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2024-08-10 23:22:34,455 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 23:22:49,313 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-10 23:22:54,627 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.940e+01 3.360e+01 3.850e+01 6.660e+01, threshold=6.719e+01, percent-clipped=3.0 2024-08-10 23:23:11,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=797050.0, ans=0.125 2024-08-10 23:23:17,504 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7250, loss[loss=0.07915, beats_loss=0.0132, ecapa_loss=0.0001681, whisper_loss=0.06428, over 14678.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01165, ecapa_loss=0.000221, whisper_loss=0.09357, over 3881855.57 frames. ], batch size: 57, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:23:27,054 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2024-08-10 23:23:48,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=797350.0, ans=0.125 2024-08-10 23:23:55,272 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-10 23:24:03,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=797450.0, ans=0.0 2024-08-10 23:24:04,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=797450.0, ans=0.1 2024-08-10 23:24:08,390 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-10 23:24:09,969 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 23:24:23,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=797550.0, ans=0.0 2024-08-10 23:24:31,516 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7300, loss[loss=0.1141, beats_loss=0.01132, ecapa_loss=0.0002082, whisper_loss=0.1007, over 20531.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01155, ecapa_loss=0.0002231, whisper_loss=0.09428, over 3873009.12 frames. ], batch size: 82, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:24:33,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=797650.0, ans=0.0 2024-08-10 23:24:34,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=797650.0, ans=0.125 2024-08-10 23:25:04,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=797850.0, ans=0.0 2024-08-10 23:25:08,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=797850.0, ans=0.125 2024-08-10 23:25:10,360 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 22 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-10 23:25:13,151 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-10 23:25:19,946 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 23:25:22,588 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.736e+01 3.135e+01 3.639e+01 8.330e+01, threshold=6.270e+01, percent-clipped=2.0 2024-08-10 23:25:24,024 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 23:25:25,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=797950.0, ans=0.0 2024-08-10 23:25:27,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=797950.0, ans=0.125 2024-08-10 23:25:29,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=798050.0, ans=0.125 2024-08-10 23:25:43,751 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7350, loss[loss=0.1219, beats_loss=0.01356, ecapa_loss=0.0002298, whisper_loss=0.106, over 21280.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01158, ecapa_loss=0.0002251, whisper_loss=0.09403, over 3909031.65 frames. ], batch size: 89, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:25:52,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=798150.0, ans=0.05 2024-08-10 23:25:53,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=798150.0, ans=0.125 2024-08-10 23:25:53,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=798150.0, ans=0.0 2024-08-10 23:25:55,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=798150.0, ans=0.125 2024-08-10 23:26:01,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=798250.0, ans=0.2 2024-08-10 23:26:18,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=798350.0, ans=0.125 2024-08-10 23:26:21,785 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 23:26:23,198 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 23:26:28,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=798450.0, ans=0.05 2024-08-10 23:26:34,629 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-10 23:26:36,880 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.02 vs. limit=22.5 2024-08-10 23:26:36,922 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.06 vs. limit=10.0 2024-08-10 23:26:44,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=798550.0, ans=0.0 2024-08-10 23:26:54,026 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7400, loss[loss=0.1002, beats_loss=0.01499, ecapa_loss=0.00022, whisper_loss=0.08304, over 19205.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.0116, ecapa_loss=0.0002246, whisper_loss=0.09442, over 3920950.59 frames. ], batch size: 79, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:26:54,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=798650.0, ans=0.125 2024-08-10 23:27:35,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=798950.0, ans=0.125 2024-08-10 23:27:37,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=798950.0, ans=0.0 2024-08-10 23:27:41,293 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.86 vs. limit=22.5 2024-08-10 23:27:42,877 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.647e+01 3.053e+01 3.534e+01 7.826e+01, threshold=6.106e+01, percent-clipped=2.0 2024-08-10 23:28:00,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=799050.0, ans=0.05 2024-08-10 23:28:04,092 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7450, loss[loss=0.1329, beats_loss=0.01061, ecapa_loss=0.0002473, whisper_loss=0.1198, over 17735.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01156, ecapa_loss=0.0002247, whisper_loss=0.09419, over 3908964.33 frames. ], batch size: 72, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:28:04,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=799150.0, ans=0.125 2024-08-10 23:28:26,136 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-10 23:28:27,324 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 23:28:31,471 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 23:28:39,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=799350.0, ans=0.125 2024-08-10 23:28:52,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=799450.0, ans=0.125 2024-08-10 23:29:03,256 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 23:29:06,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=799550.0, ans=0.2 2024-08-10 23:29:12,744 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7500, loss[loss=0.1268, beats_loss=0.008712, ecapa_loss=0.0002387, whisper_loss=0.1157, over 23011.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.0116, ecapa_loss=0.0002232, whisper_loss=0.0946, over 3935016.30 frames. ], batch size: 88, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:29:14,180 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-10 23:29:34,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=799750.0, ans=0.0 2024-08-10 23:29:41,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=799850.0, ans=0.125 2024-08-10 23:29:47,979 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 23:29:50,515 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-10 23:29:50,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=799850.0, ans=0.125 2024-08-10 23:30:03,976 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.88 vs. limit=15.0 2024-08-10 23:30:04,458 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.160e+01 2.761e+01 3.186e+01 3.767e+01 5.987e+01, threshold=6.373e+01, percent-clipped=0.0 2024-08-10 23:30:13,786 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.82 vs. limit=6.0 2024-08-10 23:30:15,594 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 23:30:25,964 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7550, loss[loss=0.0994, beats_loss=0.0112, ecapa_loss=0.0002752, whisper_loss=0.08545, over 14703.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.0116, ecapa_loss=0.0002223, whisper_loss=0.09439, over 3904786.20 frames. ], batch size: 59, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:30:31,552 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 23:30:38,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=800150.0, ans=0.125 2024-08-10 23:30:51,472 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 23:30:59,056 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=15.0 2024-08-10 23:31:07,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=800350.0, ans=0.0 2024-08-10 23:31:07,981 INFO [train_multi_KD3.py:844] (2/4) A total of 97 cuts. 26 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-10 23:31:08,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=800450.0, ans=0.125 2024-08-10 23:31:17,741 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2024-08-10 23:31:18,977 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 23:31:23,377 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 23:31:30,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=800550.0, ans=0.1 2024-08-10 23:31:37,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=800650.0, ans=0.1 2024-08-10 23:31:38,847 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7600, loss[loss=0.1117, beats_loss=0.01308, ecapa_loss=0.0001982, whisper_loss=0.09662, over 22855.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01164, ecapa_loss=0.0002224, whisper_loss=0.09362, over 3895101.01 frames. ], batch size: 89, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:31:42,944 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.72 vs. limit=10.0 2024-08-10 23:31:55,024 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 23:31:56,518 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-10 23:32:11,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=800850.0, ans=0.1 2024-08-10 23:32:14,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=800850.0, ans=0.2 2024-08-10 23:32:30,181 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.785e+01 3.066e+01 3.767e+01 8.128e+01, threshold=6.132e+01, percent-clipped=1.0 2024-08-10 23:32:31,656 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 23:32:51,439 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7650, loss[loss=0.1245, beats_loss=0.01181, ecapa_loss=0.0002166, whisper_loss=0.1106, over 22390.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01156, ecapa_loss=0.0002234, whisper_loss=0.09456, over 3927761.40 frames. ], batch size: 87, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:33:01,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=801150.0, ans=0.07 2024-08-10 23:33:09,795 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 23:33:15,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=801250.0, ans=0.125 2024-08-10 23:33:23,282 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 23:33:37,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=801450.0, ans=0.2 2024-08-10 23:33:54,205 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 23:33:54,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=801550.0, ans=0.0 2024-08-10 23:34:01,601 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7700, loss[loss=0.1143, beats_loss=0.01256, ecapa_loss=0.0002439, whisper_loss=0.09926, over 21077.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01161, ecapa_loss=0.0002223, whisper_loss=0.09439, over 3940631.13 frames. ], batch size: 88, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:34:03,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=801650.0, ans=0.0 2024-08-10 23:34:03,698 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.50 vs. limit=22.5 2024-08-10 23:34:08,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=801650.0, ans=0.125 2024-08-10 23:34:20,733 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 23:34:32,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=801850.0, ans=0.0 2024-08-10 23:34:39,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=801850.0, ans=0.1 2024-08-10 23:34:46,281 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 36 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 23:34:50,526 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.824e+01 3.342e+01 3.789e+01 5.468e+01, threshold=6.684e+01, percent-clipped=0.0 2024-08-10 23:34:53,292 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 12 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 23:35:03,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=802050.0, ans=15.0 2024-08-10 23:35:11,078 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7750, loss[loss=0.08745, beats_loss=0.01189, ecapa_loss=0.0002519, whisper_loss=0.07303, over 12651.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01165, ecapa_loss=0.0002205, whisper_loss=0.09424, over 3899405.89 frames. ], batch size: 55, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:35:19,835 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 14 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 23:35:30,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=802250.0, ans=0.125 2024-08-10 23:35:49,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=802350.0, ans=0.2 2024-08-10 23:35:49,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=802350.0, ans=0.125 2024-08-10 23:36:15,936 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 23:36:25,068 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7800, loss[loss=0.1192, beats_loss=0.009321, ecapa_loss=0.0002883, whisper_loss=0.107, over 20909.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01163, ecapa_loss=0.0002209, whisper_loss=0.09411, over 3915760.96 frames. ], batch size: 87, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:36:36,747 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 23:36:49,771 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 23:36:51,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=802750.0, ans=0.0 2024-08-10 23:36:54,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=802850.0, ans=0.125 2024-08-10 23:36:54,984 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 23:37:05,658 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-10 23:37:14,446 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.252e+01 2.895e+01 3.316e+01 3.988e+01 7.505e+01, threshold=6.631e+01, percent-clipped=2.0 2024-08-10 23:37:19,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=802950.0, ans=0.125 2024-08-10 23:37:26,225 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 23:37:35,935 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7850, loss[loss=0.145, beats_loss=0.007504, ecapa_loss=0.0002644, whisper_loss=0.1348, over 18519.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01154, ecapa_loss=0.0002222, whisper_loss=0.09505, over 3908091.30 frames. ], batch size: 70, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:37:40,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=803150.0, ans=0.125 2024-08-10 23:37:48,040 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.94 vs. limit=12.0 2024-08-10 23:37:51,213 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 23:38:14,318 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 23:38:42,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=803550.0, ans=0.125 2024-08-10 23:38:47,399 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7900, loss[loss=0.1004, beats_loss=0.01407, ecapa_loss=0.0001765, whisper_loss=0.08456, over 22768.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01156, ecapa_loss=0.0002209, whisper_loss=0.09529, over 3892057.69 frames. ], batch size: 91, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:38:49,527 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=12.0 2024-08-10 23:39:08,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=803750.0, ans=0.125 2024-08-10 23:39:13,288 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 34 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 23:39:26,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=803850.0, ans=0.0 2024-08-10 23:39:37,619 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.898e+01 3.197e+01 3.826e+01 5.899e+01, threshold=6.393e+01, percent-clipped=0.0 2024-08-10 23:39:54,494 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 23:39:56,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=804050.0, ans=0.0 2024-08-10 23:39:58,261 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 7950, loss[loss=0.08929, beats_loss=0.01215, ecapa_loss=0.0002252, whisper_loss=0.07489, over 13733.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01163, ecapa_loss=0.0002218, whisper_loss=0.09456, over 3909866.12 frames. ], batch size: 55, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:40:02,074 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2024-08-10 23:40:03,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=804150.0, ans=0.125 2024-08-10 23:40:10,548 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-10 23:40:20,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=804250.0, ans=0.1 2024-08-10 23:40:43,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=804450.0, ans=0.1 2024-08-10 23:40:43,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=804450.0, ans=0.125 2024-08-10 23:41:04,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=804650.0, ans=0.125 2024-08-10 23:41:05,849 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8000, loss[loss=0.1035, beats_loss=0.01316, ecapa_loss=0.0002117, whisper_loss=0.08817, over 21831.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01162, ecapa_loss=0.0002219, whisper_loss=0.09437, over 3906519.94 frames. ], batch size: 87, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:41:08,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=804650.0, ans=0.0 2024-08-10 23:41:09,091 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.93 vs. limit=22.5 2024-08-10 23:41:09,814 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 23:41:15,816 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2024-08-10 23:41:19,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=804750.0, ans=0.2 2024-08-10 23:41:31,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=804850.0, ans=0.07 2024-08-10 23:41:36,389 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-10 23:41:39,062 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-10 23:41:39,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=804850.0, ans=0.0 2024-08-10 23:41:51,994 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.180e+01 2.694e+01 3.160e+01 3.631e+01 6.005e+01, threshold=6.321e+01, percent-clipped=0.0 2024-08-10 23:42:02,410 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.32 vs. limit=6.0 2024-08-10 23:42:03,543 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-08-10 23:42:09,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=805050.0, ans=0.0 2024-08-10 23:42:11,674 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8050, loss[loss=0.1134, beats_loss=0.008562, ecapa_loss=0.0002643, whisper_loss=0.1022, over 17823.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01159, ecapa_loss=0.000221, whisper_loss=0.09397, over 3880133.19 frames. ], batch size: 74, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:42:18,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=805150.0, ans=0.125 2024-08-10 23:42:29,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=805250.0, ans=0.125 2024-08-10 23:42:35,984 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-10 23:43:06,369 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 23:43:18,201 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8100, loss[loss=0.1285, beats_loss=0.01126, ecapa_loss=0.0001955, whisper_loss=0.1153, over 21734.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01159, ecapa_loss=0.0002218, whisper_loss=0.09399, over 3879143.34 frames. ], batch size: 83, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:43:29,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=805650.0, ans=0.1 2024-08-10 23:43:32,858 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 23:43:36,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=805750.0, ans=0.125 2024-08-10 23:43:40,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=805750.0, ans=0.125 2024-08-10 23:43:46,118 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.12 vs. limit=12.0 2024-08-10 23:44:01,544 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 23:44:04,017 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.753e+01 3.174e+01 3.635e+01 5.123e+01, threshold=6.349e+01, percent-clipped=0.0 2024-08-10 23:44:10,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=806050.0, ans=0.125 2024-08-10 23:44:10,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=806050.0, ans=0.2 2024-08-10 23:44:11,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=806050.0, ans=0.1 2024-08-10 23:44:12,598 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 23:44:17,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=806050.0, ans=0.125 2024-08-10 23:44:24,388 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8150, loss[loss=0.1076, beats_loss=0.01049, ecapa_loss=0.0002449, whisper_loss=0.0947, over 13846.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01159, ecapa_loss=0.0002228, whisper_loss=0.09377, over 3873097.63 frames. ], batch size: 55, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:44:38,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=806250.0, ans=0.125 2024-08-10 23:44:49,913 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.36 vs. limit=22.5 2024-08-10 23:45:15,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=806450.0, ans=0.1 2024-08-10 23:45:24,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=806550.0, ans=0.0 2024-08-10 23:45:27,986 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 33 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 23:45:30,516 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8200, loss[loss=0.09337, beats_loss=0.01542, ecapa_loss=0.0002295, whisper_loss=0.07566, over 18967.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01154, ecapa_loss=0.0002223, whisper_loss=0.09426, over 3886699.79 frames. ], batch size: 81, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:45:30,692 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 23:45:39,618 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 31 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 23:45:44,325 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.99 vs. limit=22.5 2024-08-10 23:45:45,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.41 vs. limit=15.0 2024-08-10 23:46:03,432 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 23:46:04,739 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 23:46:08,920 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-10 23:46:10,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=806950.0, ans=0.125 2024-08-10 23:46:16,296 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 2.784e+01 3.124e+01 3.627e+01 5.044e+01, threshold=6.248e+01, percent-clipped=0.0 2024-08-10 23:46:33,811 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 20 from LS+wenet, 30 from Vox, 43 fro AS 2024-08-10 23:46:36,066 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8250, loss[loss=0.08804, beats_loss=0.01369, ecapa_loss=0.0002083, whisper_loss=0.07226, over 21685.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01166, ecapa_loss=0.0002207, whisper_loss=0.09354, over 3880367.49 frames. ], batch size: 90, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:46:36,220 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 23:46:49,464 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 23:46:54,834 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 26 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-10 23:46:56,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=807250.0, ans=0.2 2024-08-10 23:46:58,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=807250.0, ans=0.2 2024-08-10 23:47:03,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=807350.0, ans=0.125 2024-08-10 23:47:05,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=807350.0, ans=0.95 2024-08-10 23:47:06,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=807350.0, ans=0.0 2024-08-10 23:47:09,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=807350.0, ans=0.0 2024-08-10 23:47:10,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=807350.0, ans=0.125 2024-08-10 23:47:18,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=807450.0, ans=0.125 2024-08-10 23:47:19,461 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.33 vs. limit=8.0 2024-08-10 23:47:19,931 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 23:47:25,207 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 23:47:28,776 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.97 vs. limit=22.5 2024-08-10 23:47:29,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=807550.0, ans=0.125 2024-08-10 23:47:32,015 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-10 23:47:33,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=807550.0, ans=0.0 2024-08-10 23:47:42,450 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8300, loss[loss=0.117, beats_loss=0.01395, ecapa_loss=0.0001845, whisper_loss=0.1012, over 23493.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.0116, ecapa_loss=0.0002215, whisper_loss=0.0934, over 3897583.61 frames. ], batch size: 94, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:47:55,953 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-10 23:47:57,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=807750.0, ans=0.2 2024-08-10 23:48:03,713 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 23:48:05,419 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 23:48:22,301 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 23:48:29,183 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.802e+01 3.186e+01 3.664e+01 3.254e+02, threshold=6.372e+01, percent-clipped=4.0 2024-08-10 23:48:34,856 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.81 vs. limit=15.0 2024-08-10 23:48:48,750 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8350, loss[loss=0.108, beats_loss=0.01479, ecapa_loss=0.0001802, whisper_loss=0.09144, over 16541.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01154, ecapa_loss=0.0002222, whisper_loss=0.09409, over 3896548.59 frames. ], batch size: 67, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:48:50,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=808150.0, ans=0.0 2024-08-10 23:48:51,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=808150.0, ans=0.0 2024-08-10 23:49:10,490 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.06 vs. limit=6.0 2024-08-10 23:49:16,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.15 vs. limit=22.5 2024-08-10 23:49:23,830 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-10 23:49:35,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=808450.0, ans=0.1 2024-08-10 23:49:43,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=808550.0, ans=0.0 2024-08-10 23:49:45,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=808550.0, ans=0.125 2024-08-10 23:49:53,579 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8400, loss[loss=0.08358, beats_loss=0.01391, ecapa_loss=0.0001874, whisper_loss=0.0678, over 21171.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01153, ecapa_loss=0.0002217, whisper_loss=0.09424, over 3918700.25 frames. ], batch size: 84, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:50:14,998 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 23:50:17,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=808750.0, ans=0.125 2024-08-10 23:50:34,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=808950.0, ans=0.125 2024-08-10 23:50:39,269 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.636e+01 3.039e+01 3.423e+01 5.250e+01, threshold=6.078e+01, percent-clipped=0.0 2024-08-10 23:50:44,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=809050.0, ans=0.125 2024-08-10 23:50:51,610 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-10 23:50:59,300 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8450, loss[loss=0.09943, beats_loss=0.01411, ecapa_loss=0.0001469, whisper_loss=0.08385, over 16609.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01149, ecapa_loss=0.0002215, whisper_loss=0.09425, over 3884039.42 frames. ], batch size: 63, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:51:05,445 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.62 vs. limit=15.0 2024-08-10 23:51:12,995 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 23:51:13,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=809250.0, ans=0.125 2024-08-10 23:51:16,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=809250.0, ans=0.2 2024-08-10 23:51:17,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=809250.0, ans=0.2 2024-08-10 23:51:21,725 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 23:51:23,186 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 18 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 23:51:40,371 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 23:51:42,263 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.250e-01 2024-08-10 23:52:06,390 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8500, loss[loss=0.08771, beats_loss=0.01574, ecapa_loss=0.00016, whisper_loss=0.07036, over 22387.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01157, ecapa_loss=0.0002205, whisper_loss=0.09352, over 3902802.48 frames. ], batch size: 94, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:52:08,562 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-10 23:52:26,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=809750.0, ans=0.0 2024-08-10 23:52:38,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=809850.0, ans=0.125 2024-08-10 23:52:47,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=809850.0, ans=0.125 2024-08-10 23:52:59,681 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+01 2.787e+01 3.102e+01 3.651e+01 5.135e+01, threshold=6.204e+01, percent-clipped=0.0 2024-08-10 23:53:01,095 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 23:53:10,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=810050.0, ans=0.125 2024-08-10 23:53:21,953 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8550, loss[loss=0.1118, beats_loss=0.01123, ecapa_loss=0.0002239, whisper_loss=0.09834, over 18693.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01157, ecapa_loss=0.0002206, whisper_loss=0.09368, over 3903309.84 frames. ], batch size: 73, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:53:54,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=810350.0, ans=0.125 2024-08-10 23:54:02,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=810350.0, ans=10.0 2024-08-10 23:54:11,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=810450.0, ans=0.125 2024-08-10 23:54:16,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=810450.0, ans=0.05 2024-08-10 23:54:19,278 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 23:54:32,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=810550.0, ans=0.125 2024-08-10 23:54:34,504 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8600, loss[loss=0.1097, beats_loss=0.01209, ecapa_loss=0.000269, whisper_loss=0.09491, over 21586.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01163, ecapa_loss=0.0002204, whisper_loss=0.09366, over 3893047.34 frames. ], batch size: 92, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:54:46,751 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 23:54:48,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=810750.0, ans=0.0 2024-08-10 23:55:03,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=810850.0, ans=0.0 2024-08-10 23:55:06,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=810850.0, ans=0.125 2024-08-10 23:55:21,632 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.42 vs. limit=10.0 2024-08-10 23:55:23,994 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 2.846e+01 3.382e+01 3.840e+01 6.128e+01, threshold=6.764e+01, percent-clipped=0.0 2024-08-10 23:55:26,181 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.57 vs. limit=15.0 2024-08-10 23:55:32,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.05 vs. limit=15.0 2024-08-10 23:55:34,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=811050.0, ans=0.125 2024-08-10 23:55:44,471 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8650, loss[loss=0.1143, beats_loss=0.008581, ecapa_loss=0.0002318, whisper_loss=0.1034, over 14332.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01154, ecapa_loss=0.0002228, whisper_loss=0.09351, over 3859627.83 frames. ], batch size: 55, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:55:58,792 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=22.5 2024-08-10 23:56:09,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=811250.0, ans=0.125 2024-08-10 23:56:14,265 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 23:56:18,138 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 23:56:22,211 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.36 vs. limit=10.0 2024-08-10 23:56:22,371 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.96 vs. limit=22.5 2024-08-10 23:56:23,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=811350.0, ans=0.1 2024-08-10 23:56:39,728 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.80 vs. limit=22.5 2024-08-10 23:56:55,213 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8700, loss[loss=0.1244, beats_loss=0.009566, ecapa_loss=0.0002591, whisper_loss=0.1123, over 19584.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01153, ecapa_loss=0.0002226, whisper_loss=0.09364, over 3859965.24 frames. ], batch size: 81, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:57:02,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=811650.0, ans=0.125 2024-08-10 23:57:03,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=811650.0, ans=10.0 2024-08-10 23:57:10,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=811750.0, ans=0.05 2024-08-10 23:57:27,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=811850.0, ans=0.125 2024-08-10 23:57:43,676 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.694e+01 2.974e+01 3.412e+01 6.571e+01, threshold=5.947e+01, percent-clipped=0.0 2024-08-10 23:57:43,848 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 13 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 23:58:00,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=812050.0, ans=0.2 2024-08-10 23:58:02,958 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 23:58:04,120 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8750, loss[loss=0.09925, beats_loss=0.01188, ecapa_loss=0.0001694, whisper_loss=0.08568, over 18300.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01154, ecapa_loss=0.0002224, whisper_loss=0.09349, over 3842545.97 frames. ], batch size: 71, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:58:05,732 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 23:58:12,627 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2024-08-10 23:58:21,637 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.05 vs. limit=15.0 2024-08-10 23:58:25,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=812250.0, ans=0.125 2024-08-10 23:58:28,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=812250.0, ans=0.125 2024-08-10 23:58:32,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.08 vs. limit=10.0 2024-08-10 23:58:33,774 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.40 vs. limit=15.0 2024-08-10 23:58:34,933 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.72 vs. limit=22.5 2024-08-10 23:58:35,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=812350.0, ans=10.0 2024-08-10 23:58:55,193 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-10 23:59:11,246 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 23:59:12,316 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8800, loss[loss=0.09978, beats_loss=0.01175, ecapa_loss=0.0002524, whisper_loss=0.08551, over 16771.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01165, ecapa_loss=0.0002198, whisper_loss=0.09366, over 3851316.04 frames. ], batch size: 68, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:59:12,547 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 16 from Vox, 50 fro AS 2024-08-10 23:59:22,158 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 23:59:29,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=812750.0, ans=0.1 2024-08-10 23:59:37,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=812750.0, ans=0.2 2024-08-10 23:59:54,544 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 23:59:59,392 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.902e+01 3.394e+01 3.776e+01 5.499e+01, threshold=6.788e+01, percent-clipped=0.0 2024-08-11 00:00:21,446 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8850, loss[loss=0.09482, beats_loss=0.01322, ecapa_loss=0.0002122, whisper_loss=0.07947, over 13663.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01171, ecapa_loss=0.0002194, whisper_loss=0.0936, over 3832242.25 frames. ], batch size: 54, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:00:30,293 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=9.888e-02 2024-08-11 00:00:48,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=813350.0, ans=0.04949747468305833 2024-08-11 00:00:50,943 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 00:00:51,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=813350.0, ans=0.125 2024-08-11 00:01:19,555 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2024-08-11 00:01:30,499 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8900, loss[loss=0.08617, beats_loss=0.01322, ecapa_loss=0.0001938, whisper_loss=0.07101, over 16694.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01168, ecapa_loss=0.0002201, whisper_loss=0.09346, over 3826652.98 frames. ], batch size: 65, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:01:36,460 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 00:01:42,868 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 00:01:44,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=813750.0, ans=0.125 2024-08-11 00:01:56,705 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.27 vs. limit=22.5 2024-08-11 00:01:59,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=813850.0, ans=0.125 2024-08-11 00:02:01,308 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.76 vs. limit=12.0 2024-08-11 00:02:03,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=813850.0, ans=0.2 2024-08-11 00:02:05,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=813850.0, ans=0.125 2024-08-11 00:02:10,481 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.23 vs. limit=10.0 2024-08-11 00:02:17,033 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 00:02:18,243 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.661e+01 2.983e+01 3.454e+01 5.391e+01, threshold=5.966e+01, percent-clipped=0.0 2024-08-11 00:02:20,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=813950.0, ans=0.2 2024-08-11 00:02:20,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=813950.0, ans=0.125 2024-08-11 00:02:23,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=814050.0, ans=0.1 2024-08-11 00:02:25,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=814050.0, ans=0.125 2024-08-11 00:02:37,936 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 8950, loss[loss=0.1202, beats_loss=0.01155, ecapa_loss=0.0002222, whisper_loss=0.1064, over 22122.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01173, ecapa_loss=0.0002185, whisper_loss=0.09274, over 3846814.40 frames. ], batch size: 91, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:02:50,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=814250.0, ans=0.1 2024-08-11 00:03:24,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=814450.0, ans=0.2 2024-08-11 00:03:28,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=814450.0, ans=0.0 2024-08-11 00:03:35,782 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.42 vs. limit=15.0 2024-08-11 00:03:44,142 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9000, loss[loss=0.1142, beats_loss=0.01097, ecapa_loss=0.0002255, whisper_loss=0.1009, over 16482.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.0117, ecapa_loss=0.0002175, whisper_loss=0.09286, over 3844277.02 frames. ], batch size: 64, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:03:44,142 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 00:04:24,130 INFO [train_multi_KD3.py:1149] (2/4) Epoch 6, validation on ASR_libri: loss=0.2598, beats_loss=0, ecapa_loss=0.0006942, whisper_loss=0.2529, over 922467.00 frames. 2024-08-11 00:04:43,498 INFO [train_multi_KD3.py:1149] (2/4) Epoch 6, validation on SV_voxceleb1: loss=0.005764, beats_loss=0, ecapa_loss=0.0005764, whisper_loss=0, over 939242.00 frames. 2024-08-11 00:05:51,170 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.2028, 2.9603, 3.1997, 3.0111], device='cuda:2') 2024-08-11 00:06:37,865 INFO [train_multi_KD3.py:1149] (2/4) Epoch 6, validation on AT_audioset: loss=0.02592, beats_loss=0.02592, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 00:06:37,868 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 00:06:38,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=814650.0, ans=0.125 2024-08-11 00:06:38,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=814650.0, ans=0.2 2024-08-11 00:06:46,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=814650.0, ans=0.125 2024-08-11 00:06:51,555 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-08-11 00:07:10,748 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 00:07:30,717 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.866e+01 3.382e+01 4.145e+01 7.682e+01, threshold=6.764e+01, percent-clipped=3.0 2024-08-11 00:07:30,944 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 00:07:49,155 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-11 00:07:54,294 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9050, loss[loss=0.1136, beats_loss=0.01285, ecapa_loss=0.0001612, whisper_loss=0.09912, over 21916.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01161, ecapa_loss=0.000218, whisper_loss=0.09298, over 3802841.38 frames. ], batch size: 84, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:07:58,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=815150.0, ans=0.125 2024-08-11 00:07:59,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=815150.0, ans=0.035 2024-08-11 00:08:03,753 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 13 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 00:08:11,167 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-11 00:08:16,351 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 26 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 00:08:30,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=815350.0, ans=0.125 2024-08-11 00:08:34,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=815350.0, ans=0.2 2024-08-11 00:08:34,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=815350.0, ans=0.2 2024-08-11 00:08:37,259 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 00:08:37,828 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.51 vs. limit=6.0 2024-08-11 00:08:51,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=815450.0, ans=0.125 2024-08-11 00:09:02,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=815550.0, ans=0.0 2024-08-11 00:09:03,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=815550.0, ans=0.1 2024-08-11 00:09:08,067 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9100, loss[loss=0.1086, beats_loss=0.01014, ecapa_loss=0.0002738, whisper_loss=0.09573, over 20289.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01147, ecapa_loss=0.0002203, whisper_loss=0.09348, over 3802968.98 frames. ], batch size: 86, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:09:20,370 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.118e-01 2024-08-11 00:09:29,963 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 00:09:34,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=815750.0, ans=0.025 2024-08-11 00:09:48,535 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 00:09:55,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=815950.0, ans=0.0 2024-08-11 00:09:55,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=815950.0, ans=0.125 2024-08-11 00:09:58,488 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.712e+01 2.999e+01 3.385e+01 5.028e+01, threshold=5.998e+01, percent-clipped=0.0 2024-08-11 00:10:03,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=815950.0, ans=0.2 2024-08-11 00:10:04,697 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 00:10:20,702 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9150, loss[loss=0.1336, beats_loss=0.01006, ecapa_loss=0.0002774, whisper_loss=0.1208, over 17077.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01147, ecapa_loss=0.0002191, whisper_loss=0.0937, over 3802773.08 frames. ], batch size: 73, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:10:36,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=816250.0, ans=0.5 2024-08-11 00:10:52,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=816350.0, ans=0.1 2024-08-11 00:10:53,518 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 00:11:10,766 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-11 00:11:36,474 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9200, loss[loss=0.1076, beats_loss=0.0107, ecapa_loss=0.000232, whisper_loss=0.0946, over 20490.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01152, ecapa_loss=0.0002218, whisper_loss=0.09373, over 3802817.80 frames. ], batch size: 83, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:11:48,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=816650.0, ans=0.0 2024-08-11 00:11:54,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=816750.0, ans=0.125 2024-08-11 00:12:06,696 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.98 vs. limit=15.0 2024-08-11 00:12:07,827 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 00:12:15,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=816850.0, ans=0.125 2024-08-11 00:12:15,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=816850.0, ans=0.1 2024-08-11 00:12:19,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=816950.0, ans=0.125 2024-08-11 00:12:24,156 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-11 00:12:27,758 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.183e+01 2.632e+01 3.033e+01 3.497e+01 1.383e+02, threshold=6.066e+01, percent-clipped=1.0 2024-08-11 00:12:45,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=817050.0, ans=0.2 2024-08-11 00:12:48,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9250, loss[loss=0.1024, beats_loss=0.01082, ecapa_loss=0.0002707, whisper_loss=0.08886, over 21391.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01153, ecapa_loss=0.0002225, whisper_loss=0.0935, over 3801043.27 frames. ], batch size: 90, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:12:57,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=817150.0, ans=0.125 2024-08-11 00:13:05,467 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 00:13:10,058 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 00:13:14,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=817250.0, ans=0.125 2024-08-11 00:13:16,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=817250.0, ans=0.125 2024-08-11 00:13:18,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=817350.0, ans=0.2 2024-08-11 00:13:34,767 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.58 vs. limit=6.0 2024-08-11 00:13:35,683 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 18 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 00:13:36,213 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2024-08-11 00:13:48,117 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 00:14:02,182 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 18 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-11 00:14:06,675 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9300, loss[loss=0.1096, beats_loss=0.01176, ecapa_loss=0.0001799, whisper_loss=0.09603, over 22823.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01145, ecapa_loss=0.0002212, whisper_loss=0.09351, over 3785192.49 frames. ], batch size: 90, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:14:20,296 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2024-08-11 00:14:49,935 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.88 vs. limit=15.0 2024-08-11 00:14:58,220 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.667e+01 2.966e+01 3.383e+01 7.144e+01, threshold=5.931e+01, percent-clipped=1.0 2024-08-11 00:14:58,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=817950.0, ans=0.125 2024-08-11 00:15:00,427 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=12.0 2024-08-11 00:15:10,717 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 00:15:11,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=818050.0, ans=0.125 2024-08-11 00:15:12,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=818050.0, ans=0.0 2024-08-11 00:15:14,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=818050.0, ans=15.0 2024-08-11 00:15:19,586 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9350, loss[loss=0.1129, beats_loss=0.01214, ecapa_loss=0.0002027, whisper_loss=0.0987, over 17298.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01145, ecapa_loss=0.0002211, whisper_loss=0.09515, over 3824530.24 frames. ], batch size: 66, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:15:31,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=818150.0, ans=0.05 2024-08-11 00:15:42,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=818250.0, ans=0.125 2024-08-11 00:15:57,293 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.76 vs. limit=10.0 2024-08-11 00:16:04,648 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=15.0 2024-08-11 00:16:08,408 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 00:16:22,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=818550.0, ans=0.125 2024-08-11 00:16:32,267 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9400, loss[loss=0.137, beats_loss=0.01246, ecapa_loss=0.0002191, whisper_loss=0.1223, over 17945.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01145, ecapa_loss=0.0002201, whisper_loss=0.0956, over 3871366.40 frames. ], batch size: 73, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:16:45,898 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 00:16:49,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=818750.0, ans=0.0 2024-08-11 00:17:00,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=818850.0, ans=0.0 2024-08-11 00:17:08,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=818850.0, ans=0.125 2024-08-11 00:17:20,355 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.820e+01 3.162e+01 3.777e+01 5.486e+01, threshold=6.323e+01, percent-clipped=0.0 2024-08-11 00:17:27,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=819050.0, ans=0.1 2024-08-11 00:17:41,132 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9450, loss[loss=0.1056, beats_loss=0.01338, ecapa_loss=0.0002188, whisper_loss=0.09001, over 20913.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01152, ecapa_loss=0.0002198, whisper_loss=0.09539, over 3892107.80 frames. ], batch size: 87, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:17:42,683 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 00:17:52,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=819150.0, ans=0.125 2024-08-11 00:18:12,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=819350.0, ans=0.125 2024-08-11 00:18:18,881 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=12.0 2024-08-11 00:18:43,408 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-11 00:18:48,724 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9500, loss[loss=0.1266, beats_loss=0.01048, ecapa_loss=0.0001918, whisper_loss=0.1142, over 20223.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01149, ecapa_loss=0.0002196, whisper_loss=0.09607, over 3900512.12 frames. ], batch size: 73, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:18:51,723 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 00:18:57,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=819650.0, ans=0.125 2024-08-11 00:19:03,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=819750.0, ans=0.125 2024-08-11 00:19:14,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=819750.0, ans=0.125 2024-08-11 00:19:37,264 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.201e+01 2.851e+01 3.283e+01 3.927e+01 7.522e+01, threshold=6.566e+01, percent-clipped=2.0 2024-08-11 00:19:38,766 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 00:19:39,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=819950.0, ans=0.1 2024-08-11 00:19:51,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=820050.0, ans=0.025 2024-08-11 00:19:52,315 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.78 vs. limit=15.0 2024-08-11 00:19:52,745 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 00:19:58,517 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9550, loss[loss=0.1221, beats_loss=0.007522, ecapa_loss=0.0002793, whisper_loss=0.1118, over 14184.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01151, ecapa_loss=0.0002213, whisper_loss=0.09469, over 3872839.43 frames. ], batch size: 55, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:20:01,319 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 00:20:10,505 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 00:20:24,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=820350.0, ans=0.035 2024-08-11 00:20:25,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=820350.0, ans=0.09899494936611666 2024-08-11 00:20:46,185 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 00:20:51,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=820550.0, ans=0.125 2024-08-11 00:20:58,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=820550.0, ans=0.125 2024-08-11 00:21:04,549 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9600, loss[loss=0.1088, beats_loss=0.01016, ecapa_loss=0.0002502, whisper_loss=0.09614, over 17459.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01151, ecapa_loss=0.0002196, whisper_loss=0.09508, over 3870641.46 frames. ], batch size: 73, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:21:13,825 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-11 00:21:15,153 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 00:21:50,897 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 2.676e+01 3.117e+01 3.565e+01 7.658e+01, threshold=6.234e+01, percent-clipped=1.0 2024-08-11 00:21:55,465 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.02 vs. limit=15.0 2024-08-11 00:21:57,729 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-11 00:22:07,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=821050.0, ans=0.125 2024-08-11 00:22:10,857 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9650, loss[loss=0.09282, beats_loss=0.01029, ecapa_loss=0.0002194, whisper_loss=0.08033, over 19395.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01145, ecapa_loss=0.0002198, whisper_loss=0.09486, over 3850750.89 frames. ], batch size: 78, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:22:12,271 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-11 00:22:24,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=821250.0, ans=0.0 2024-08-11 00:22:25,784 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 00:22:26,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=821250.0, ans=0.125 2024-08-11 00:23:02,294 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 34 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-11 00:23:11,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=821550.0, ans=0.0 2024-08-11 00:23:16,419 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9700, loss[loss=0.1034, beats_loss=0.008761, ecapa_loss=0.0002509, whisper_loss=0.09211, over 13630.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01148, ecapa_loss=0.0002194, whisper_loss=0.09468, over 3866923.69 frames. ], batch size: 55, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:23:22,195 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 00:23:28,964 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 00:23:37,248 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.463e-01 2024-08-11 00:23:39,766 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-11 00:23:44,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=821850.0, ans=0.0 2024-08-11 00:23:56,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=821950.0, ans=0.125 2024-08-11 00:24:02,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=821950.0, ans=0.0 2024-08-11 00:24:02,886 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+01 2.803e+01 3.195e+01 3.718e+01 6.974e+01, threshold=6.391e+01, percent-clipped=1.0 2024-08-11 00:24:17,743 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=7.507e-02 2024-08-11 00:24:22,392 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9750, loss[loss=0.1026, beats_loss=0.01155, ecapa_loss=0.0002205, whisper_loss=0.08881, over 21897.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01153, ecapa_loss=0.0002177, whisper_loss=0.09449, over 3863845.88 frames. ], batch size: 92, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:24:23,769 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 00:24:23,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=822150.0, ans=0.1 2024-08-11 00:24:25,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=822150.0, ans=0.125 2024-08-11 00:24:34,377 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 00:24:46,229 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.80 vs. limit=10.0 2024-08-11 00:24:49,392 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 00:24:59,390 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 00:25:02,439 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.95 vs. limit=15.0 2024-08-11 00:25:08,238 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-11 00:25:16,668 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=15.0 2024-08-11 00:25:26,181 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9800, loss[loss=0.1262, beats_loss=0.01092, ecapa_loss=0.0001917, whisper_loss=0.1134, over 20214.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01146, ecapa_loss=0.0002168, whisper_loss=0.09534, over 3865367.96 frames. ], batch size: 77, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:25:34,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822650.0, ans=0.1 2024-08-11 00:25:36,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=822650.0, ans=0.2 2024-08-11 00:25:36,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822650.0, ans=0.1 2024-08-11 00:25:45,843 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2024-08-11 00:25:59,001 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.31 vs. limit=15.0 2024-08-11 00:26:12,133 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.728e+01 3.058e+01 3.533e+01 7.097e+01, threshold=6.116e+01, percent-clipped=1.0 2024-08-11 00:26:12,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=822950.0, ans=0.0 2024-08-11 00:26:20,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=823050.0, ans=0.0 2024-08-11 00:26:25,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=823050.0, ans=0.125 2024-08-11 00:26:31,799 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9850, loss[loss=0.1106, beats_loss=0.01113, ecapa_loss=0.0002256, whisper_loss=0.09723, over 22507.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01151, ecapa_loss=0.0002168, whisper_loss=0.09486, over 3885614.40 frames. ], batch size: 93, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:26:54,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=823250.0, ans=0.125 2024-08-11 00:26:58,205 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 00:27:03,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=823350.0, ans=0.0 2024-08-11 00:27:09,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=823450.0, ans=0.125 2024-08-11 00:27:13,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=823450.0, ans=0.125 2024-08-11 00:27:17,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.20 vs. limit=6.0 2024-08-11 00:27:19,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=823450.0, ans=0.1 2024-08-11 00:27:37,738 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9900, loss[loss=0.107, beats_loss=0.01109, ecapa_loss=0.0002386, whisper_loss=0.09356, over 19398.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01153, ecapa_loss=0.0002173, whisper_loss=0.09427, over 3902856.18 frames. ], batch size: 78, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:27:39,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=15.0 2024-08-11 00:27:42,229 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.67 vs. limit=22.5 2024-08-11 00:27:43,722 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.00 vs. limit=22.5 2024-08-11 00:27:49,365 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 00:28:06,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=823850.0, ans=0.0 2024-08-11 00:28:08,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=823850.0, ans=0.125 2024-08-11 00:28:23,790 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.723e+01 2.993e+01 3.476e+01 9.466e+01, threshold=5.985e+01, percent-clipped=1.0 2024-08-11 00:28:34,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=824050.0, ans=0.125 2024-08-11 00:28:43,476 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 9950, loss[loss=0.1033, beats_loss=0.01337, ecapa_loss=0.0002041, whisper_loss=0.08786, over 21686.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01156, ecapa_loss=0.0002182, whisper_loss=0.09412, over 3896926.95 frames. ], batch size: 89, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:28:43,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=824150.0, ans=0.0 2024-08-11 00:28:48,198 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.41 vs. limit=15.0 2024-08-11 00:29:00,529 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 00:29:09,672 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 00:29:10,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824350.0, ans=0.1 2024-08-11 00:29:20,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=824350.0, ans=0.125 2024-08-11 00:29:27,910 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 00:29:45,862 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 00:29:47,159 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 00:29:48,203 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10000, loss[loss=0.09579, beats_loss=0.01145, ecapa_loss=0.0002558, whisper_loss=0.08178, over 17007.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01158, ecapa_loss=0.0002174, whisper_loss=0.09423, over 3885318.21 frames. ], batch size: 72, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:29:48,387 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-11 00:29:56,209 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 00:30:37,447 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.064e+01 2.691e+01 3.032e+01 3.574e+01 5.004e+01, threshold=6.065e+01, percent-clipped=0.0 2024-08-11 00:30:43,947 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2024-08-11 00:30:47,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=825050.0, ans=0.125 2024-08-11 00:30:51,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=825050.0, ans=0.125 2024-08-11 00:30:56,867 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10050, loss[loss=0.1141, beats_loss=0.009857, ecapa_loss=0.000202, whisper_loss=0.1022, over 23815.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01156, ecapa_loss=0.0002171, whisper_loss=0.09478, over 3913625.22 frames. ], batch size: 90, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:31:02,881 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 00:31:03,993 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 00:31:06,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=825150.0, ans=0.0 2024-08-11 00:31:11,609 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 00:31:36,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=825350.0, ans=0.1 2024-08-11 00:31:36,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=825350.0, ans=0.0 2024-08-11 00:31:37,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=825350.0, ans=0.025 2024-08-11 00:31:45,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=825350.0, ans=0.0 2024-08-11 00:31:52,639 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 00:31:53,934 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 00:31:54,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=825450.0, ans=0.1 2024-08-11 00:31:54,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=825450.0, ans=0.1 2024-08-11 00:32:12,199 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 00:32:35,422 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10100, loss[loss=0.1179, beats_loss=0.01199, ecapa_loss=0.0002666, whisper_loss=0.1033, over 17591.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01161, ecapa_loss=0.0002181, whisper_loss=0.0946, over 3915463.49 frames. ], batch size: 72, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:32:54,229 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-11 00:33:18,937 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 00:33:30,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=825850.0, ans=0.5 2024-08-11 00:33:33,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=825850.0, ans=0.125 2024-08-11 00:33:43,869 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 11 from LS+wenet, 33 from Vox, 28 fro AS 2024-08-11 00:33:55,013 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.818e+01 3.128e+01 3.591e+01 5.480e+01, threshold=6.256e+01, percent-clipped=0.0 2024-08-11 00:34:33,665 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 00:34:34,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10150, loss[loss=0.1074, beats_loss=0.01024, ecapa_loss=0.0002224, whisper_loss=0.09497, over 18826.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01159, ecapa_loss=0.0002189, whisper_loss=0.09446, over 3935546.66 frames. ], batch size: 75, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:34:46,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=826150.0, ans=0.0 2024-08-11 00:34:59,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=826250.0, ans=0.125 2024-08-11 00:35:10,780 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-11 00:35:12,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=826250.0, ans=0.2 2024-08-11 00:35:20,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.37 vs. limit=22.5 2024-08-11 00:35:28,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=826350.0, ans=0.1 2024-08-11 00:36:03,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=826450.0, ans=0.0 2024-08-11 00:36:21,152 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 00:36:37,728 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10200, loss[loss=0.09634, beats_loss=0.01215, ecapa_loss=0.0002063, whisper_loss=0.08213, over 17816.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01159, ecapa_loss=0.0002191, whisper_loss=0.09444, over 3935205.09 frames. ], batch size: 69, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:36:42,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=826650.0, ans=0.04949747468305833 2024-08-11 00:37:25,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=826850.0, ans=0.0 2024-08-11 00:37:34,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=826950.0, ans=0.07 2024-08-11 00:37:40,979 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.078e+01 2.717e+01 3.021e+01 3.434e+01 5.708e+01, threshold=6.043e+01, percent-clipped=0.0 2024-08-11 00:37:48,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.40 vs. limit=6.0 2024-08-11 00:37:56,904 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 00:38:03,706 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10250, loss[loss=0.1134, beats_loss=0.01329, ecapa_loss=0.0001842, whisper_loss=0.09822, over 23882.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01158, ecapa_loss=0.0002196, whisper_loss=0.09403, over 3895025.58 frames. ], batch size: 94, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:38:04,126 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 00:38:21,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=827250.0, ans=0.125 2024-08-11 00:38:27,155 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.61 vs. limit=15.0 2024-08-11 00:38:31,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=827250.0, ans=0.1 2024-08-11 00:38:36,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=827350.0, ans=0.1 2024-08-11 00:38:37,935 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.47 vs. limit=15.0 2024-08-11 00:38:39,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=827350.0, ans=0.125 2024-08-11 00:38:41,334 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 39 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 00:38:41,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=827350.0, ans=0.125 2024-08-11 00:38:45,836 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 00:38:50,012 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.45 vs. limit=10.0 2024-08-11 00:39:04,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=827550.0, ans=0.125 2024-08-11 00:39:05,397 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 00:39:19,620 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10300, loss[loss=0.1053, beats_loss=0.01211, ecapa_loss=0.0002472, whisper_loss=0.09076, over 22527.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01161, ecapa_loss=0.0002179, whisper_loss=0.09357, over 3904659.21 frames. ], batch size: 94, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:39:33,206 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.477e-01 2024-08-11 00:39:41,820 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-08-11 00:39:42,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=827750.0, ans=0.125 2024-08-11 00:39:58,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=827850.0, ans=0.125 2024-08-11 00:40:10,669 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 00:40:13,045 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.682e+01 2.875e+01 3.472e+01 4.715e+01, threshold=5.749e+01, percent-clipped=0.0 2024-08-11 00:40:36,206 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10350, loss[loss=0.1011, beats_loss=0.01061, ecapa_loss=0.0002241, whisper_loss=0.08827, over 16258.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01173, ecapa_loss=0.0002164, whisper_loss=0.09306, over 3904050.65 frames. ], batch size: 62, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:40:36,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=828150.0, ans=0.0 2024-08-11 00:40:41,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=828150.0, ans=0.125 2024-08-11 00:40:55,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=828250.0, ans=0.0 2024-08-11 00:41:06,713 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-11 00:41:07,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=828350.0, ans=0.2 2024-08-11 00:41:11,815 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 00:41:19,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=828450.0, ans=0.125 2024-08-11 00:41:28,969 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 00:41:32,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=828450.0, ans=0.5 2024-08-11 00:41:51,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=828550.0, ans=0.2 2024-08-11 00:41:54,224 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10400, loss[loss=0.1062, beats_loss=0.01184, ecapa_loss=0.0002143, whisper_loss=0.09225, over 18203.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01173, ecapa_loss=0.0002161, whisper_loss=0.09286, over 3896823.12 frames. ], batch size: 70, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:41:57,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=828650.0, ans=0.07 2024-08-11 00:42:02,837 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-11 00:42:12,655 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-11 00:42:29,103 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 16 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 00:42:32,072 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-11 00:42:36,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=828850.0, ans=0.0 2024-08-11 00:42:42,461 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-11 00:42:49,829 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.708e+01 2.999e+01 3.498e+01 5.568e+01, threshold=5.997e+01, percent-clipped=0.0 2024-08-11 00:43:11,221 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 00:43:14,210 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10450, loss[loss=0.09349, beats_loss=0.01121, ecapa_loss=0.0002311, whisper_loss=0.07996, over 15602.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01166, ecapa_loss=0.0002162, whisper_loss=0.09273, over 3880789.97 frames. ], batch size: 66, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:43:44,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=829250.0, ans=0.0 2024-08-11 00:43:46,850 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 18 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 00:43:47,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=23.79 vs. limit=22.5 2024-08-11 00:44:14,933 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 00:44:29,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=829550.0, ans=0.0 2024-08-11 00:44:35,662 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10500, loss[loss=0.09186, beats_loss=0.0132, ecapa_loss=0.000185, whisper_loss=0.07681, over 22676.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01167, ecapa_loss=0.0002184, whisper_loss=0.09234, over 3862701.77 frames. ], batch size: 90, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:44:43,099 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 00:44:56,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=829750.0, ans=0.0 2024-08-11 00:45:04,745 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2024-08-11 00:45:05,418 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-11 00:45:08,681 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 00:45:17,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=829850.0, ans=0.2 2024-08-11 00:45:20,350 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 00:45:27,269 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.092e+01 2.730e+01 2.985e+01 3.287e+01 5.938e+01, threshold=5.970e+01, percent-clipped=0.0 2024-08-11 00:45:31,824 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 34 from Vox, 33 fro AS 2024-08-11 00:45:32,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=829950.0, ans=0.0 2024-08-11 00:45:48,302 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-11 00:45:49,743 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10550, loss[loss=0.1188, beats_loss=0.01139, ecapa_loss=0.0001844, whisper_loss=0.1055, over 21436.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01163, ecapa_loss=0.0002189, whisper_loss=0.09247, over 3884824.60 frames. ], batch size: 84, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:45:50,527 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.10 vs. limit=15.0 2024-08-11 00:45:52,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=830150.0, ans=0.0 2024-08-11 00:46:21,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=830350.0, ans=0.1 2024-08-11 00:46:40,300 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 00:47:08,486 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10600, loss[loss=0.08949, beats_loss=0.01208, ecapa_loss=0.0002145, whisper_loss=0.07526, over 16685.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01159, ecapa_loss=0.000219, whisper_loss=0.0926, over 3888043.79 frames. ], batch size: 67, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:47:47,445 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 00:47:52,392 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.40 vs. limit=10.0 2024-08-11 00:47:57,441 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 00:48:00,776 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.647e+01 3.131e+01 3.600e+01 5.761e+01, threshold=6.263e+01, percent-clipped=0.0 2024-08-11 00:48:07,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=831050.0, ans=6.0 2024-08-11 00:48:13,647 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 00:48:23,827 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10650, loss[loss=0.1009, beats_loss=0.01533, ecapa_loss=0.0001797, whisper_loss=0.08381, over 22071.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01163, ecapa_loss=0.0002172, whisper_loss=0.09319, over 3885067.56 frames. ], batch size: 90, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:48:35,573 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-11 00:48:38,008 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.04 vs. limit=22.5 2024-08-11 00:48:41,902 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 00:48:50,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=831250.0, ans=0.125 2024-08-11 00:48:55,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=831350.0, ans=0.125 2024-08-11 00:49:07,538 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=12.0 2024-08-11 00:49:14,636 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 21 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-11 00:49:31,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=831550.0, ans=0.0 2024-08-11 00:49:40,202 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10700, loss[loss=0.12, beats_loss=0.01156, ecapa_loss=0.0001711, whisper_loss=0.1067, over 15218.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01168, ecapa_loss=0.0002149, whisper_loss=0.09352, over 3871648.36 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:49:54,680 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 33 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 00:49:59,226 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-11 00:50:00,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=831750.0, ans=0.125 2024-08-11 00:50:05,777 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2024-08-11 00:50:18,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=831850.0, ans=0.125 2024-08-11 00:50:31,516 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.817e+01 3.065e+01 3.573e+01 8.621e+01, threshold=6.130e+01, percent-clipped=1.0 2024-08-11 00:50:42,020 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 00:50:49,486 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 00:50:53,647 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10750, loss[loss=0.1422, beats_loss=0.008607, ecapa_loss=0.0002595, whisper_loss=0.131, over 15694.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01166, ecapa_loss=0.0002158, whisper_loss=0.09393, over 3892294.37 frames. ], batch size: 60, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:51:11,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=832250.0, ans=0.1 2024-08-11 00:51:16,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=832250.0, ans=0.0 2024-08-11 00:51:36,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=832350.0, ans=0.0 2024-08-11 00:51:37,409 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 00:51:40,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=832450.0, ans=0.1 2024-08-11 00:51:44,260 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 00:52:05,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=832550.0, ans=0.2 2024-08-11 00:52:08,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=832550.0, ans=0.125 2024-08-11 00:52:11,009 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10800, loss[loss=0.09664, beats_loss=0.01245, ecapa_loss=0.0001813, whisper_loss=0.08238, over 23293.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01166, ecapa_loss=0.0002172, whisper_loss=0.09425, over 3912249.32 frames. ], batch size: 91, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:52:12,708 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 15 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 00:52:47,768 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.85 vs. limit=10.0 2024-08-11 00:52:49,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=832850.0, ans=0.125 2024-08-11 00:52:55,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=832950.0, ans=0.125 2024-08-11 00:53:04,607 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+01 2.723e+01 3.219e+01 3.827e+01 1.923e+02, threshold=6.438e+01, percent-clipped=1.0 2024-08-11 00:53:23,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=833050.0, ans=0.0 2024-08-11 00:53:24,322 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 00:53:26,868 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10850, loss[loss=0.09444, beats_loss=0.01329, ecapa_loss=0.000209, whisper_loss=0.07906, over 21961.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01173, ecapa_loss=0.0002165, whisper_loss=0.09447, over 3921765.21 frames. ], batch size: 94, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:53:39,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=833150.0, ans=0.125 2024-08-11 00:53:43,812 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 00:54:00,898 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 00:54:29,157 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 00:54:39,802 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.66 vs. limit=15.0 2024-08-11 00:54:43,348 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10900, loss[loss=0.1166, beats_loss=0.01216, ecapa_loss=0.0001988, whisper_loss=0.1025, over 21826.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.0117, ecapa_loss=0.0002177, whisper_loss=0.09422, over 3914451.81 frames. ], batch size: 88, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:54:48,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=833650.0, ans=0.125 2024-08-11 00:54:49,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=833650.0, ans=0.1 2024-08-11 00:54:53,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=833650.0, ans=0.0 2024-08-11 00:54:54,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=833650.0, ans=0.07 2024-08-11 00:54:55,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=833650.0, ans=0.125 2024-08-11 00:55:01,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=833750.0, ans=0.125 2024-08-11 00:55:13,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=833850.0, ans=0.125 2024-08-11 00:55:17,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=833850.0, ans=0.0 2024-08-11 00:55:31,428 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 00:55:33,482 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 00:55:35,791 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.641e+01 2.975e+01 3.587e+01 5.714e+01, threshold=5.950e+01, percent-clipped=0.0 2024-08-11 00:55:37,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=833950.0, ans=0.0 2024-08-11 00:55:54,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=834050.0, ans=0.04949747468305833 2024-08-11 00:55:58,425 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 10950, loss[loss=0.1298, beats_loss=0.009417, ecapa_loss=0.0002032, whisper_loss=0.1184, over 23113.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01163, ecapa_loss=0.0002172, whisper_loss=0.09478, over 3892190.68 frames. ], batch size: 86, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:56:15,860 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-11 00:56:21,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=834250.0, ans=0.0 2024-08-11 00:56:36,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=834350.0, ans=0.125 2024-08-11 00:57:01,684 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 00:57:06,079 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 00:57:08,849 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 00:57:13,050 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11000, loss[loss=0.09003, beats_loss=0.0128, ecapa_loss=0.0002435, whisper_loss=0.0748, over 18261.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01165, ecapa_loss=0.0002162, whisper_loss=0.09476, over 3936135.51 frames. ], batch size: 77, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:57:22,441 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2024-08-11 00:57:34,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=834750.0, ans=0.0 2024-08-11 00:58:06,073 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.83 vs. limit=15.0 2024-08-11 00:58:06,897 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 2.814e+01 3.042e+01 3.466e+01 5.998e+01, threshold=6.084e+01, percent-clipped=1.0 2024-08-11 00:58:25,836 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-11 00:58:27,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=835050.0, ans=0.0 2024-08-11 00:58:30,975 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11050, loss[loss=0.1055, beats_loss=0.01217, ecapa_loss=0.000274, whisper_loss=0.09056, over 19162.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01162, ecapa_loss=0.0002163, whisper_loss=0.09464, over 3931659.17 frames. ], batch size: 80, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:58:35,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=835150.0, ans=0.125 2024-08-11 00:58:44,868 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 14 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 00:58:54,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=835250.0, ans=0.125 2024-08-11 00:59:01,333 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 18 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 00:59:29,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=835450.0, ans=0.0 2024-08-11 00:59:30,428 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.17 vs. limit=15.0 2024-08-11 00:59:54,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=835550.0, ans=0.125 2024-08-11 00:59:55,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=835550.0, ans=0.0 2024-08-11 00:59:57,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=835650.0, ans=0.125 2024-08-11 00:59:58,322 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11100, loss[loss=0.1092, beats_loss=0.01381, ecapa_loss=0.0001672, whisper_loss=0.09368, over 20825.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01163, ecapa_loss=0.0002172, whisper_loss=0.09411, over 3876704.12 frames. ], batch size: 80, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:00:17,545 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-11 01:00:43,075 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-11 01:00:53,858 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.772e+01 3.086e+01 3.680e+01 7.620e+01, threshold=6.173e+01, percent-clipped=1.0 2024-08-11 01:01:08,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=836050.0, ans=0.2 2024-08-11 01:01:10,534 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 01:01:19,031 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11150, loss[loss=0.1229, beats_loss=0.009595, ecapa_loss=0.0002382, whisper_loss=0.1109, over 22167.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01153, ecapa_loss=0.0002182, whisper_loss=0.09383, over 3859494.71 frames. ], batch size: 87, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:01:30,571 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.51 vs. limit=15.0 2024-08-11 01:01:45,657 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.08 vs. limit=15.0 2024-08-11 01:02:00,223 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 01:02:14,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=836450.0, ans=0.07 2024-08-11 01:02:23,867 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2024-08-11 01:02:29,574 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 01:02:36,643 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11200, loss[loss=0.1184, beats_loss=0.01027, ecapa_loss=0.000202, whisper_loss=0.1061, over 20204.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01147, ecapa_loss=0.0002187, whisper_loss=0.09396, over 3841504.73 frames. ], batch size: 77, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:02:45,685 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 01:03:04,905 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2024-08-11 01:03:14,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=836850.0, ans=0.125 2024-08-11 01:03:20,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=836850.0, ans=0.0 2024-08-11 01:03:33,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=836950.0, ans=0.125 2024-08-11 01:03:35,270 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.788e+01 3.078e+01 3.604e+01 6.278e+01, threshold=6.156e+01, percent-clipped=2.0 2024-08-11 01:04:00,998 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11250, loss[loss=0.09824, beats_loss=0.01408, ecapa_loss=0.000195, whisper_loss=0.08221, over 22415.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01148, ecapa_loss=0.0002195, whisper_loss=0.09433, over 3858329.83 frames. ], batch size: 90, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:04:12,965 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.99 vs. limit=22.5 2024-08-11 01:04:17,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=837250.0, ans=0.125 2024-08-11 01:04:26,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=837250.0, ans=0.5 2024-08-11 01:04:36,162 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 01:04:44,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=837350.0, ans=0.0 2024-08-11 01:05:05,092 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-11 01:05:06,817 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 01:05:15,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=837550.0, ans=0.125 2024-08-11 01:05:17,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=837550.0, ans=0.2 2024-08-11 01:05:25,282 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11300, loss[loss=0.1028, beats_loss=0.01266, ecapa_loss=0.0002572, whisper_loss=0.08758, over 18481.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01156, ecapa_loss=0.0002195, whisper_loss=0.09361, over 3838940.22 frames. ], batch size: 78, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:05:31,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=837650.0, ans=0.0 2024-08-11 01:05:50,883 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 01:05:56,921 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2024-08-11 01:06:00,098 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 01:06:03,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=837850.0, ans=0.0 2024-08-11 01:06:05,099 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.65 vs. limit=22.5 2024-08-11 01:06:11,151 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2024-08-11 01:06:21,047 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.704e+01 3.204e+01 3.789e+01 1.454e+02, threshold=6.408e+01, percent-clipped=1.0 2024-08-11 01:06:24,477 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=8.723e-02 2024-08-11 01:06:32,120 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-11 01:06:42,668 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-11 01:06:45,817 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11350, loss[loss=0.09544, beats_loss=0.01368, ecapa_loss=0.0002214, whisper_loss=0.07954, over 22474.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01157, ecapa_loss=0.0002193, whisper_loss=0.09402, over 3869829.99 frames. ], batch size: 94, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:07:24,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=838350.0, ans=15.0 2024-08-11 01:07:42,428 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-11 01:07:51,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=838550.0, ans=0.125 2024-08-11 01:08:03,386 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11400, loss[loss=0.1036, beats_loss=0.01276, ecapa_loss=0.0002241, whisper_loss=0.08865, over 21880.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01154, ecapa_loss=0.0002197, whisper_loss=0.09477, over 3845823.81 frames. ], batch size: 89, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:08:05,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=838650.0, ans=0.125 2024-08-11 01:08:07,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=838650.0, ans=0.0 2024-08-11 01:08:17,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=838650.0, ans=0.0 2024-08-11 01:08:20,735 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 01:08:44,589 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2024-08-11 01:08:55,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=838950.0, ans=0.0 2024-08-11 01:08:58,499 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.886e+01 3.314e+01 4.166e+01 1.030e+02, threshold=6.628e+01, percent-clipped=1.0 2024-08-11 01:09:00,871 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.54 vs. limit=12.0 2024-08-11 01:09:12,073 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 01:09:15,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=839050.0, ans=0.125 2024-08-11 01:09:18,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=839050.0, ans=0.0 2024-08-11 01:09:19,465 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 01:09:20,981 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11450, loss[loss=0.1144, beats_loss=0.01002, ecapa_loss=0.000228, whisper_loss=0.1021, over 16675.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01157, ecapa_loss=0.0002197, whisper_loss=0.09432, over 3870153.42 frames. ], batch size: 66, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:09:22,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=839150.0, ans=0.1 2024-08-11 01:09:34,066 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 12 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 01:10:09,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=839350.0, ans=0.0 2024-08-11 01:10:16,680 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2024-08-11 01:10:26,342 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 01:10:33,869 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-11 01:10:42,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=839650.0, ans=0.1 2024-08-11 01:10:44,076 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11500, loss[loss=0.1057, beats_loss=0.01152, ecapa_loss=0.0002132, whisper_loss=0.09206, over 13657.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01157, ecapa_loss=0.0002188, whisper_loss=0.09454, over 3868805.62 frames. ], batch size: 55, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:10:52,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=839650.0, ans=0.0 2024-08-11 01:10:55,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=839650.0, ans=0.125 2024-08-11 01:11:43,273 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.130e+01 2.719e+01 3.134e+01 3.590e+01 4.797e+01, threshold=6.268e+01, percent-clipped=0.0 2024-08-11 01:11:51,204 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.46 vs. limit=22.5 2024-08-11 01:12:03,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=840050.0, ans=0.125 2024-08-11 01:12:06,913 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11550, loss[loss=0.1193, beats_loss=0.01256, ecapa_loss=0.0001815, whisper_loss=0.1049, over 23346.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01156, ecapa_loss=0.0002187, whisper_loss=0.09455, over 3888302.79 frames. ], batch size: 92, lr: 1.02e-02, grad_scale: 140737488355328.0 2024-08-11 01:12:07,995 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.99 vs. limit=15.0 2024-08-11 01:12:08,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=840150.0, ans=0.1 2024-08-11 01:12:14,668 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 01:12:28,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=840250.0, ans=0.125 2024-08-11 01:12:54,268 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-11 01:13:09,992 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2024-08-11 01:13:12,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=840550.0, ans=0.0 2024-08-11 01:13:12,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=840550.0, ans=0.0 2024-08-11 01:13:18,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=840550.0, ans=0.125 2024-08-11 01:13:27,838 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11600, loss[loss=0.09005, beats_loss=0.01171, ecapa_loss=0.0002076, whisper_loss=0.07627, over 17694.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01157, ecapa_loss=0.0002193, whisper_loss=0.09411, over 3866392.51 frames. ], batch size: 72, lr: 1.02e-02, grad_scale: 140737488355328.0 2024-08-11 01:13:38,579 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 17 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 01:13:46,158 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.92 vs. limit=22.5 2024-08-11 01:13:56,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=840750.0, ans=0.0 2024-08-11 01:14:00,496 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 01:14:23,788 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.204e+01 2.786e+01 3.126e+01 3.591e+01 6.008e+01, threshold=6.251e+01, percent-clipped=0.0 2024-08-11 01:14:26,696 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-11 01:14:35,732 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.73 vs. limit=22.5 2024-08-11 01:14:39,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=841050.0, ans=0.0 2024-08-11 01:14:45,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=841150.0, ans=0.125 2024-08-11 01:14:46,998 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11650, loss[loss=0.1055, beats_loss=0.01228, ecapa_loss=0.0001859, whisper_loss=0.0914, over 16219.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.0115, ecapa_loss=0.000218, whisper_loss=0.09481, over 3911596.44 frames. ], batch size: 63, lr: 1.02e-02, grad_scale: 140737488355328.0 2024-08-11 01:14:48,752 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 01:14:51,862 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-11 01:15:31,945 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 31 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 01:15:35,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=841450.0, ans=0.125 2024-08-11 01:15:41,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=841450.0, ans=0.1 2024-08-11 01:15:46,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=841450.0, ans=0.0 2024-08-11 01:15:51,006 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.95 vs. limit=15.0 2024-08-11 01:16:00,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=841550.0, ans=0.025 2024-08-11 01:16:03,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=841550.0, ans=0.0 2024-08-11 01:16:05,994 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11700, loss[loss=0.09997, beats_loss=0.01207, ecapa_loss=0.0001869, whisper_loss=0.08604, over 17278.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01158, ecapa_loss=0.0002193, whisper_loss=0.09456, over 3894111.06 frames. ], batch size: 69, lr: 1.02e-02, grad_scale: 140737488355328.0 2024-08-11 01:16:40,254 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.34 vs. limit=22.5 2024-08-11 01:16:58,415 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 01:16:59,473 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.297e+01 2.883e+01 3.187e+01 3.882e+01 5.856e+01, threshold=6.374e+01, percent-clipped=0.0 2024-08-11 01:17:04,254 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2024-08-11 01:17:07,785 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-11 01:17:12,168 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 33 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 01:17:14,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=842050.0, ans=0.125 2024-08-11 01:17:23,009 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11750, loss[loss=0.1097, beats_loss=0.01297, ecapa_loss=0.000246, whisper_loss=0.09423, over 21191.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01163, ecapa_loss=0.0002199, whisper_loss=0.09481, over 3921319.81 frames. ], batch size: 88, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:17:26,017 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 01:17:27,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=842150.0, ans=0.125 2024-08-11 01:17:32,471 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-11 01:17:38,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=842250.0, ans=0.1 2024-08-11 01:17:47,748 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 01:17:59,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=842350.0, ans=0.0 2024-08-11 01:18:15,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=842450.0, ans=0.2 2024-08-11 01:18:20,015 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 01:18:32,839 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 01:18:35,584 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2024-08-11 01:18:40,783 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11800, loss[loss=0.1181, beats_loss=0.00805, ecapa_loss=0.0002277, whisper_loss=0.1078, over 22049.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01167, ecapa_loss=0.0002189, whisper_loss=0.09413, over 3919420.49 frames. ], batch size: 87, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:18:47,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.98 vs. limit=12.0 2024-08-11 01:18:48,472 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 01:18:51,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=842650.0, ans=0.0 2024-08-11 01:18:57,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=842750.0, ans=0.0 2024-08-11 01:19:05,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=842750.0, ans=0.0 2024-08-11 01:19:08,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.43 vs. limit=15.0 2024-08-11 01:19:10,016 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-08-11 01:19:25,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=842850.0, ans=0.125 2024-08-11 01:19:31,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=842950.0, ans=0.125 2024-08-11 01:19:36,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=842950.0, ans=0.2 2024-08-11 01:19:38,032 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.232e+01 2.831e+01 3.248e+01 3.772e+01 8.461e+01, threshold=6.495e+01, percent-clipped=3.0 2024-08-11 01:19:53,685 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 20 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-11 01:20:03,190 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11850, loss[loss=0.08899, beats_loss=0.0127, ecapa_loss=0.0001893, whisper_loss=0.07439, over 18386.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01165, ecapa_loss=0.0002202, whisper_loss=0.09343, over 3910647.65 frames. ], batch size: 71, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:20:21,695 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 01:20:32,494 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 23 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-11 01:20:41,790 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 01:20:43,514 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 27 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 01:20:57,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=843450.0, ans=0.0 2024-08-11 01:21:05,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=843550.0, ans=0.2 2024-08-11 01:21:10,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=843550.0, ans=0.0 2024-08-11 01:21:16,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=843550.0, ans=0.1 2024-08-11 01:21:20,980 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11900, loss[loss=0.1035, beats_loss=0.009794, ecapa_loss=0.0002848, whisper_loss=0.09087, over 20338.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01157, ecapa_loss=0.0002208, whisper_loss=0.0942, over 3930912.03 frames. ], batch size: 87, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:21:27,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=843650.0, ans=0.125 2024-08-11 01:21:33,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=843650.0, ans=0.125 2024-08-11 01:21:45,044 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 01:21:51,329 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 01:21:54,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=843850.0, ans=0.2 2024-08-11 01:22:09,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=843950.0, ans=0.1 2024-08-11 01:22:11,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=843950.0, ans=22.5 2024-08-11 01:22:13,034 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.860e+01 3.257e+01 3.543e+01 6.146e+01, threshold=6.513e+01, percent-clipped=0.0 2024-08-11 01:22:14,764 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 01:22:14,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=843950.0, ans=0.1 2024-08-11 01:22:15,323 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2024-08-11 01:22:28,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=844050.0, ans=0.125 2024-08-11 01:22:34,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=844150.0, ans=0.0 2024-08-11 01:22:34,886 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 11950, loss[loss=0.1055, beats_loss=0.01044, ecapa_loss=0.0002113, whisper_loss=0.09297, over 21584.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01153, ecapa_loss=0.000219, whisper_loss=0.09422, over 3903052.69 frames. ], batch size: 87, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:22:54,740 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 01:23:00,723 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 01:23:00,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=844250.0, ans=0.125 2024-08-11 01:23:09,301 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 01:23:29,577 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.698e+02 2024-08-11 01:23:53,703 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12000, loss[loss=0.1277, beats_loss=0.009271, ecapa_loss=0.000283, whisper_loss=0.1156, over 19885.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01157, ecapa_loss=0.0002178, whisper_loss=0.09433, over 3893695.90 frames. ], batch size: 80, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:23:53,704 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 01:24:32,691 INFO [train_multi_KD3.py:1149] (2/4) Epoch 6, validation on ASR_libri: loss=0.2603, beats_loss=0, ecapa_loss=0.0006879, whisper_loss=0.2534, over 922467.00 frames. 2024-08-11 01:24:50,838 INFO [train_multi_KD3.py:1149] (2/4) Epoch 6, validation on SV_voxceleb1: loss=0.005764, beats_loss=0, ecapa_loss=0.0005764, whisper_loss=0, over 939242.00 frames. 2024-08-11 01:26:40,377 INFO [train_multi_KD3.py:1149] (2/4) Epoch 6, validation on AT_audioset: loss=0.02599, beats_loss=0.02599, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 01:26:40,386 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 01:26:40,859 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.567e-01 2024-08-11 01:26:43,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=844650.0, ans=0.125 2024-08-11 01:27:11,116 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 01:27:11,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=844850.0, ans=0.1 2024-08-11 01:27:12,519 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 30 from LS+wenet, 6 from Vox, 26 fro AS 2024-08-11 01:27:16,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=844850.0, ans=0.07 2024-08-11 01:27:17,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=844850.0, ans=0.0 2024-08-11 01:27:21,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=844850.0, ans=0.1 2024-08-11 01:27:35,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=844950.0, ans=0.1 2024-08-11 01:27:35,912 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+01 2.989e+01 3.252e+01 3.842e+01 6.267e+01, threshold=6.505e+01, percent-clipped=0.0 2024-08-11 01:27:40,751 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 01:27:58,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=845150.0, ans=0.2 2024-08-11 01:28:00,061 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12050, loss[loss=0.1087, beats_loss=0.01123, ecapa_loss=0.0002118, whisper_loss=0.09531, over 18888.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01154, ecapa_loss=0.000218, whisper_loss=0.0947, over 3870620.92 frames. ], batch size: 72, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:28:03,870 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 01:28:07,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=845150.0, ans=0.125 2024-08-11 01:28:31,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=845350.0, ans=0.2 2024-08-11 01:28:42,575 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 18 from LS+wenet, 20 from Vox, 52 fro AS 2024-08-11 01:29:05,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=845550.0, ans=0.04949747468305833 2024-08-11 01:29:12,270 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 01:29:17,327 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12100, loss[loss=0.11, beats_loss=0.01325, ecapa_loss=0.0001889, whisper_loss=0.09491, over 20351.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01159, ecapa_loss=0.0002184, whisper_loss=0.09477, over 3862833.99 frames. ], batch size: 82, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:29:17,454 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 01:29:50,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=845850.0, ans=0.1 2024-08-11 01:30:03,303 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-11 01:30:03,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=845950.0, ans=0.125 2024-08-11 01:30:10,025 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.614e+01 2.881e+01 3.224e+01 5.170e+01, threshold=5.763e+01, percent-clipped=0.0 2024-08-11 01:30:25,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=846050.0, ans=0.025 2024-08-11 01:30:32,830 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12150, loss[loss=0.1033, beats_loss=0.01507, ecapa_loss=0.0002018, whisper_loss=0.08622, over 21905.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01159, ecapa_loss=0.0002183, whisper_loss=0.09422, over 3869024.42 frames. ], batch size: 90, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:30:59,274 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-11 01:31:09,664 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 01:31:12,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.08 vs. limit=15.0 2024-08-11 01:31:15,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=846350.0, ans=0.125 2024-08-11 01:31:16,509 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 01:31:47,530 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 18 from Vox, 52 fro AS 2024-08-11 01:31:49,818 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12200, loss[loss=0.1175, beats_loss=0.01025, ecapa_loss=0.0001797, whisper_loss=0.1055, over 16935.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01156, ecapa_loss=0.0002176, whisper_loss=0.09469, over 3872377.04 frames. ], batch size: 63, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:32:03,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=846650.0, ans=0.1 2024-08-11 01:32:04,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=846750.0, ans=0.125 2024-08-11 01:32:17,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=846750.0, ans=0.0 2024-08-11 01:32:17,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=846750.0, ans=0.125 2024-08-11 01:32:19,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=846850.0, ans=0.125 2024-08-11 01:32:30,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=846850.0, ans=0.1 2024-08-11 01:32:33,056 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 01:32:43,492 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.091e+01 2.785e+01 3.124e+01 3.706e+01 5.181e+01, threshold=6.248e+01, percent-clipped=0.0 2024-08-11 01:32:44,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=12.0 2024-08-11 01:32:45,384 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 01:32:50,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=847050.0, ans=0.125 2024-08-11 01:33:04,692 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-11 01:33:08,889 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12250, loss[loss=0.1302, beats_loss=0.01095, ecapa_loss=0.0001998, whisper_loss=0.1172, over 19268.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.0116, ecapa_loss=0.0002171, whisper_loss=0.09474, over 3910889.18 frames. ], batch size: 74, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:33:11,772 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 01:33:12,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=847150.0, ans=0.2 2024-08-11 01:33:27,324 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.30 vs. limit=22.5 2024-08-11 01:33:30,607 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 14 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 01:33:41,454 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2024-08-11 01:34:11,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=847550.0, ans=0.1 2024-08-11 01:34:18,719 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=15.0 2024-08-11 01:34:21,787 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.17 vs. limit=15.0 2024-08-11 01:34:28,131 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12300, loss[loss=0.1049, beats_loss=0.01283, ecapa_loss=0.000201, whisper_loss=0.09003, over 21866.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01159, ecapa_loss=0.0002169, whisper_loss=0.09432, over 3878425.91 frames. ], batch size: 89, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:34:28,671 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.134e+05 2024-08-11 01:34:54,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=847750.0, ans=0.125 2024-08-11 01:35:24,207 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.835e+01 3.125e+01 3.646e+01 6.261e+01, threshold=6.249e+01, percent-clipped=1.0 2024-08-11 01:35:24,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=847950.0, ans=0.125 2024-08-11 01:35:39,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=848050.0, ans=0.125 2024-08-11 01:35:43,732 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 01:35:47,650 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12350, loss[loss=0.09877, beats_loss=0.0146, ecapa_loss=0.0001748, whisper_loss=0.08242, over 22013.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01146, ecapa_loss=0.0002185, whisper_loss=0.09492, over 3909009.44 frames. ], batch size: 89, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:35:50,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=848150.0, ans=0.2 2024-08-11 01:36:06,560 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.43 vs. limit=12.0 2024-08-11 01:36:12,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=848250.0, ans=0.09899494936611666 2024-08-11 01:36:22,712 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 01:36:24,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=848350.0, ans=0.0 2024-08-11 01:36:48,516 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.26 vs. limit=10.0 2024-08-11 01:36:56,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=848550.0, ans=0.2 2024-08-11 01:36:58,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=848550.0, ans=0.125 2024-08-11 01:37:02,370 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12400, loss[loss=0.1167, beats_loss=0.008813, ecapa_loss=0.0002165, whisper_loss=0.1057, over 21194.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01148, ecapa_loss=0.0002166, whisper_loss=0.09463, over 3896721.37 frames. ], batch size: 84, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:37:02,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=848650.0, ans=0.125 2024-08-11 01:37:32,233 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 15 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-11 01:37:33,444 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 01:37:38,574 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 01:37:38,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=848850.0, ans=0.035 2024-08-11 01:37:43,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=848850.0, ans=0.125 2024-08-11 01:37:49,495 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2024-08-11 01:37:54,914 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.646e+01 2.993e+01 3.533e+01 4.877e+01, threshold=5.986e+01, percent-clipped=0.0 2024-08-11 01:38:02,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=849050.0, ans=0.125 2024-08-11 01:38:13,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=849050.0, ans=0.125 2024-08-11 01:38:16,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=849150.0, ans=0.125 2024-08-11 01:38:16,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=849150.0, ans=0.2 2024-08-11 01:38:17,159 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12450, loss[loss=0.119, beats_loss=0.01251, ecapa_loss=0.0002129, whisper_loss=0.1044, over 19681.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01152, ecapa_loss=0.0002157, whisper_loss=0.09456, over 3908969.81 frames. ], batch size: 79, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:38:20,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=849150.0, ans=0.125 2024-08-11 01:38:22,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=849150.0, ans=0.125 2024-08-11 01:38:24,910 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 01:38:48,494 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 32 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-11 01:39:06,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=849450.0, ans=0.0 2024-08-11 01:39:30,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=849550.0, ans=0.2 2024-08-11 01:39:31,177 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 01:39:32,588 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12500, loss[loss=0.09977, beats_loss=0.01247, ecapa_loss=0.0002222, whisper_loss=0.08507, over 21946.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01143, ecapa_loss=0.0002165, whisper_loss=0.09493, over 3899705.39 frames. ], batch size: 90, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:39:37,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=849650.0, ans=0.07 2024-08-11 01:39:46,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=849650.0, ans=15.0 2024-08-11 01:39:55,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=849750.0, ans=10.0 2024-08-11 01:40:04,546 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 01:40:28,763 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.816e+01 3.133e+01 3.791e+01 6.148e+01, threshold=6.266e+01, percent-clipped=1.0 2024-08-11 01:40:28,926 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 01:40:33,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=849950.0, ans=0.0 2024-08-11 01:40:41,739 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=9.984e-02 2024-08-11 01:40:41,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=850050.0, ans=0.0 2024-08-11 01:40:46,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=850050.0, ans=0.2 2024-08-11 01:40:51,605 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12550, loss[loss=0.1037, beats_loss=0.01447, ecapa_loss=0.0002054, whisper_loss=0.08723, over 20859.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01154, ecapa_loss=0.000215, whisper_loss=0.09537, over 3936967.44 frames. ], batch size: 84, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:40:58,906 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 01:41:21,079 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 01:41:25,393 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 01:41:27,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=850350.0, ans=0.04949747468305833 2024-08-11 01:41:38,733 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.33 vs. limit=15.0 2024-08-11 01:41:53,037 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 01:42:10,855 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12600, loss[loss=0.1082, beats_loss=0.01201, ecapa_loss=0.0002129, whisper_loss=0.09402, over 13326.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01164, ecapa_loss=0.0002153, whisper_loss=0.09513, over 3919805.81 frames. ], batch size: 54, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:42:14,362 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 01:42:38,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=850750.0, ans=0.125 2024-08-11 01:42:45,488 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-11 01:42:51,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=850850.0, ans=0.125 2024-08-11 01:43:06,252 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.981e+01 3.398e+01 4.026e+01 7.168e+01, threshold=6.796e+01, percent-clipped=1.0 2024-08-11 01:43:08,289 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=15.0 2024-08-11 01:43:19,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=851050.0, ans=0.125 2024-08-11 01:43:29,699 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12650, loss[loss=0.0812, beats_loss=0.01486, ecapa_loss=0.0002322, whisper_loss=0.06402, over 15386.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01157, ecapa_loss=0.0002172, whisper_loss=0.09503, over 3892841.59 frames. ], batch size: 64, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:43:36,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=851150.0, ans=0.0 2024-08-11 01:44:08,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.69 vs. limit=22.5 2024-08-11 01:44:17,244 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2024-08-11 01:44:22,088 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2024-08-11 01:44:23,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=851450.0, ans=0.1 2024-08-11 01:44:30,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=851450.0, ans=0.0 2024-08-11 01:44:30,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=851450.0, ans=0.2 2024-08-11 01:44:48,255 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12700, loss[loss=0.1057, beats_loss=0.01119, ecapa_loss=0.0002095, whisper_loss=0.09244, over 22060.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01161, ecapa_loss=0.0002173, whisper_loss=0.0951, over 3888353.40 frames. ], batch size: 91, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:45:04,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=851750.0, ans=0.0 2024-08-11 01:45:08,259 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-11 01:45:14,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2024-08-11 01:45:17,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=851850.0, ans=0.0 2024-08-11 01:45:38,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=851950.0, ans=0.125 2024-08-11 01:45:40,555 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.734e+01 2.989e+01 3.425e+01 5.621e+01, threshold=5.979e+01, percent-clipped=0.0 2024-08-11 01:45:48,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=852050.0, ans=0.1 2024-08-11 01:45:51,360 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 01:45:51,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=852050.0, ans=0.0 2024-08-11 01:46:01,100 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 01:46:04,383 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12750, loss[loss=0.1187, beats_loss=0.01022, ecapa_loss=0.0001955, whisper_loss=0.1065, over 23745.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.0116, ecapa_loss=0.0002177, whisper_loss=0.09547, over 3889186.03 frames. ], batch size: 92, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:46:08,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=852150.0, ans=0.125 2024-08-11 01:46:14,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=852150.0, ans=0.125 2024-08-11 01:46:20,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=852250.0, ans=0.0 2024-08-11 01:46:33,069 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.23 vs. limit=6.0 2024-08-11 01:46:35,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=852350.0, ans=0.0 2024-08-11 01:46:36,716 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-11 01:46:39,475 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 01:46:41,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=852350.0, ans=0.125 2024-08-11 01:46:44,207 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 01:46:48,942 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 01:47:19,347 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12800, loss[loss=0.1246, beats_loss=0.009926, ecapa_loss=0.0002523, whisper_loss=0.1122, over 20231.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01158, ecapa_loss=0.0002201, whisper_loss=0.09503, over 3880778.68 frames. ], batch size: 82, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:47:50,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=852850.0, ans=0.0 2024-08-11 01:47:52,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=852850.0, ans=0.125 2024-08-11 01:47:59,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=852850.0, ans=0.125 2024-08-11 01:48:09,537 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.804e+01 3.213e+01 3.707e+01 6.106e+01, threshold=6.425e+01, percent-clipped=1.0 2024-08-11 01:48:14,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=852950.0, ans=0.1 2024-08-11 01:48:22,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=853050.0, ans=0.0 2024-08-11 01:48:25,598 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.18 vs. limit=22.5 2024-08-11 01:48:30,672 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12850, loss[loss=0.09289, beats_loss=0.01173, ecapa_loss=0.0002161, whisper_loss=0.079, over 15483.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01168, ecapa_loss=0.0002198, whisper_loss=0.09423, over 3862383.56 frames. ], batch size: 62, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:48:33,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=853150.0, ans=0.125 2024-08-11 01:48:35,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=853150.0, ans=0.1 2024-08-11 01:48:37,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=853150.0, ans=0.0 2024-08-11 01:48:58,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=853350.0, ans=0.0 2024-08-11 01:49:05,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=853350.0, ans=0.0 2024-08-11 01:49:05,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=853350.0, ans=0.1 2024-08-11 01:49:09,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=853350.0, ans=0.125 2024-08-11 01:49:17,692 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 01:49:38,820 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 01:49:41,594 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12900, loss[loss=0.09635, beats_loss=0.01142, ecapa_loss=0.0002126, whisper_loss=0.0828, over 17255.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01155, ecapa_loss=0.0002199, whisper_loss=0.09456, over 3847343.13 frames. ], batch size: 67, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:50:02,384 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 36 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-11 01:50:02,872 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=12.0 2024-08-11 01:50:09,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=853850.0, ans=0.125 2024-08-11 01:50:16,860 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-11 01:50:32,355 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.643e+01 2.887e+01 3.353e+01 5.409e+01, threshold=5.774e+01, percent-clipped=0.0 2024-08-11 01:50:55,008 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 12950, loss[loss=0.1239, beats_loss=0.00919, ecapa_loss=0.0002109, whisper_loss=0.1126, over 23471.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01145, ecapa_loss=0.0002187, whisper_loss=0.09461, over 3841953.06 frames. ], batch size: 91, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:51:23,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=854250.0, ans=0.125 2024-08-11 01:51:26,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=854350.0, ans=0.035 2024-08-11 01:51:29,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=854350.0, ans=0.1 2024-08-11 01:51:36,411 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-08-11 01:51:42,619 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2024-08-11 01:52:00,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=854550.0, ans=0.0 2024-08-11 01:52:03,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=854550.0, ans=0.125 2024-08-11 01:52:10,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=854650.0, ans=0.0 2024-08-11 01:52:11,875 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13000, loss[loss=0.1187, beats_loss=0.0118, ecapa_loss=0.0002127, whisper_loss=0.1047, over 19842.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01154, ecapa_loss=0.0002194, whisper_loss=0.09381, over 3823640.55 frames. ], batch size: 80, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:52:49,148 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.19 vs. limit=22.5 2024-08-11 01:53:06,135 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.697e+01 3.039e+01 3.659e+01 7.134e+01, threshold=6.079e+01, percent-clipped=1.0 2024-08-11 01:53:13,105 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.56 vs. limit=6.0 2024-08-11 01:53:29,961 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13050, loss[loss=0.09423, beats_loss=0.01394, ecapa_loss=0.0002139, whisper_loss=0.07816, over 21986.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01158, ecapa_loss=0.0002192, whisper_loss=0.09344, over 3837459.04 frames. ], batch size: 89, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:53:33,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=855150.0, ans=0.125 2024-08-11 01:53:36,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=855150.0, ans=0.1 2024-08-11 01:53:53,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=855250.0, ans=0.125 2024-08-11 01:54:13,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=855350.0, ans=0.0 2024-08-11 01:54:29,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=855450.0, ans=0.125 2024-08-11 01:54:40,653 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.71 vs. limit=10.0 2024-08-11 01:54:47,749 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13100, loss[loss=0.08642, beats_loss=0.014, ecapa_loss=0.0001726, whisper_loss=0.07069, over 15336.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01158, ecapa_loss=0.0002175, whisper_loss=0.09331, over 3831456.13 frames. ], batch size: 59, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:54:59,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=855650.0, ans=0.125 2024-08-11 01:55:17,195 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 01:55:17,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=855850.0, ans=0.1 2024-08-11 01:55:19,802 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 01:55:37,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=855950.0, ans=0.0 2024-08-11 01:55:41,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=855950.0, ans=0.025 2024-08-11 01:55:44,247 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 2.870e+01 3.154e+01 3.850e+01 5.715e+01, threshold=6.308e+01, percent-clipped=0.0 2024-08-11 01:55:51,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=856050.0, ans=0.125 2024-08-11 01:55:53,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=856050.0, ans=0.125 2024-08-11 01:56:08,180 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13150, loss[loss=0.1052, beats_loss=0.01079, ecapa_loss=0.0002666, whisper_loss=0.09176, over 21465.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01156, ecapa_loss=0.0002191, whisper_loss=0.09299, over 3834873.99 frames. ], batch size: 91, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:56:12,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=856150.0, ans=0.2 2024-08-11 01:56:24,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=856250.0, ans=0.125 2024-08-11 01:56:36,765 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.10 vs. limit=12.0 2024-08-11 01:56:42,267 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-11 01:56:45,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=856350.0, ans=0.125 2024-08-11 01:56:47,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=856350.0, ans=0.0 2024-08-11 01:56:53,150 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.31 vs. limit=15.0 2024-08-11 01:56:59,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=856450.0, ans=0.0 2024-08-11 01:57:06,128 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 18 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 01:57:21,570 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2024-08-11 01:57:25,603 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13200, loss[loss=0.1211, beats_loss=0.01066, ecapa_loss=0.000213, whisper_loss=0.1083, over 20688.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01153, ecapa_loss=0.0002186, whisper_loss=0.09369, over 3856306.92 frames. ], batch size: 79, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:57:29,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=856650.0, ans=0.125 2024-08-11 01:57:38,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=856650.0, ans=0.0 2024-08-11 01:57:55,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=856850.0, ans=0.07 2024-08-11 01:58:13,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=856950.0, ans=0.5 2024-08-11 01:58:17,171 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 2.805e+01 3.191e+01 3.827e+01 5.209e+01, threshold=6.381e+01, percent-clipped=0.0 2024-08-11 01:58:19,992 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-11 01:58:24,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=857050.0, ans=0.1 2024-08-11 01:58:31,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=857050.0, ans=0.125 2024-08-11 01:58:33,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=857050.0, ans=0.1 2024-08-11 01:58:38,843 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13250, loss[loss=0.1303, beats_loss=0.00947, ecapa_loss=0.0002315, whisper_loss=0.1185, over 18901.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01155, ecapa_loss=0.0002196, whisper_loss=0.09352, over 3836376.59 frames. ], batch size: 73, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:58:46,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=857150.0, ans=0.07 2024-08-11 01:58:58,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=857250.0, ans=0.2 2024-08-11 01:59:17,609 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 01:59:22,521 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=22.5 2024-08-11 01:59:32,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=857450.0, ans=0.125 2024-08-11 01:59:38,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=857550.0, ans=0.125 2024-08-11 01:59:44,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=857550.0, ans=0.125 2024-08-11 01:59:49,411 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13300, loss[loss=0.1083, beats_loss=0.01322, ecapa_loss=0.0002344, whisper_loss=0.09272, over 23000.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01158, ecapa_loss=0.0002191, whisper_loss=0.09361, over 3847905.92 frames. ], batch size: 93, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 02:00:02,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=857750.0, ans=0.0 2024-08-11 02:00:12,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=857750.0, ans=0.125 2024-08-11 02:00:16,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=857850.0, ans=0.125 2024-08-11 02:00:23,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=857850.0, ans=0.0 2024-08-11 02:00:25,869 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 02:00:37,638 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.722e+01 2.995e+01 3.352e+01 6.535e+01, threshold=5.989e+01, percent-clipped=1.0 2024-08-11 02:00:40,513 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 02:00:57,488 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13350, loss[loss=0.1013, beats_loss=0.01157, ecapa_loss=0.0002153, whisper_loss=0.08758, over 14550.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01157, ecapa_loss=0.0002184, whisper_loss=0.09441, over 3838557.24 frames. ], batch size: 58, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 02:01:14,374 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 02:01:17,243 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 02:01:18,770 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 02:01:26,969 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 02:01:44,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=858450.0, ans=0.0 2024-08-11 02:01:49,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=858450.0, ans=0.015 2024-08-11 02:01:49,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=858450.0, ans=0.07 2024-08-11 02:02:02,582 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 02:02:04,950 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13400, loss[loss=0.09862, beats_loss=0.01107, ecapa_loss=0.0002237, whisper_loss=0.08531, over 17635.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01144, ecapa_loss=0.000219, whisper_loss=0.09496, over 3859093.60 frames. ], batch size: 71, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 02:02:26,804 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 26 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-11 02:02:28,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=858750.0, ans=0.2 2024-08-11 02:02:31,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=858850.0, ans=0.2 2024-08-11 02:02:36,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=858850.0, ans=0.1 2024-08-11 02:02:37,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=858850.0, ans=0.125 2024-08-11 02:02:44,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=858950.0, ans=0.0 2024-08-11 02:02:51,279 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 2.837e+01 3.208e+01 3.826e+01 8.458e+01, threshold=6.417e+01, percent-clipped=4.0 2024-08-11 02:02:52,248 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2024-08-11 02:03:11,062 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13450, loss[loss=0.1151, beats_loss=0.01111, ecapa_loss=0.0002006, whisper_loss=0.102, over 23262.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01136, ecapa_loss=0.0002197, whisper_loss=0.09519, over 3875214.37 frames. ], batch size: 93, lr: 1.00e-02, grad_scale: 140737488355328.0 2024-08-11 02:03:27,378 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2024-08-11 02:03:51,041 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 02:03:54,639 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 02:03:56,065 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 02:04:15,167 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-11 02:04:18,175 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13500, loss[loss=0.1139, beats_loss=0.01223, ecapa_loss=0.0002001, whisper_loss=0.09962, over 23051.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01144, ecapa_loss=0.0002202, whisper_loss=0.09522, over 3905875.87 frames. ], batch size: 90, lr: 1.00e-02, grad_scale: 140737488355328.0 2024-08-11 02:04:22,492 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-11 02:04:31,163 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.01 vs. limit=15.0 2024-08-11 02:04:45,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=859850.0, ans=0.125 2024-08-11 02:04:52,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=859850.0, ans=0.125 2024-08-11 02:04:53,239 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-11 02:04:54,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=859850.0, ans=0.2 2024-08-11 02:05:00,234 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.41 vs. limit=15.0 2024-08-11 02:05:04,815 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.793e+01 3.249e+01 3.860e+01 6.225e+01, threshold=6.498e+01, percent-clipped=0.0 2024-08-11 02:05:24,742 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13550, loss[loss=0.1112, beats_loss=0.01163, ecapa_loss=0.0001977, whisper_loss=0.09755, over 21943.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01138, ecapa_loss=0.0002216, whisper_loss=0.09481, over 3901369.16 frames. ], batch size: 85, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:05:41,561 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 02:05:43,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=860250.0, ans=0.125 2024-08-11 02:05:44,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=860250.0, ans=0.09899494936611666 2024-08-11 02:05:50,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=860250.0, ans=0.125 2024-08-11 02:05:59,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=860350.0, ans=0.1 2024-08-11 02:06:10,294 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 02:06:12,413 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 02:06:23,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=860550.0, ans=0.025 2024-08-11 02:06:34,082 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13600, loss[loss=0.1103, beats_loss=0.01062, ecapa_loss=0.0001694, whisper_loss=0.09795, over 17492.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01141, ecapa_loss=0.0002189, whisper_loss=0.09492, over 3889979.06 frames. ], batch size: 64, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:06:57,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=860750.0, ans=0.125 2024-08-11 02:07:00,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=860750.0, ans=0.125 2024-08-11 02:07:14,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=860950.0, ans=0.125 2024-08-11 02:07:23,382 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.098e+01 2.998e+01 3.369e+01 4.005e+01 6.707e+01, threshold=6.738e+01, percent-clipped=1.0 2024-08-11 02:07:25,490 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.47 vs. limit=22.5 2024-08-11 02:07:36,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=861050.0, ans=0.0 2024-08-11 02:07:41,683 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-11 02:07:42,986 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-11 02:07:44,129 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13650, loss[loss=0.1078, beats_loss=0.01005, ecapa_loss=0.0002044, whisper_loss=0.0957, over 21203.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01148, ecapa_loss=0.0002187, whisper_loss=0.09402, over 3869041.67 frames. ], batch size: 82, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:07:51,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=861150.0, ans=0.125 2024-08-11 02:07:58,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=861250.0, ans=0.1 2024-08-11 02:07:59,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=861250.0, ans=0.1 2024-08-11 02:08:20,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=861350.0, ans=0.2 2024-08-11 02:08:21,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=861350.0, ans=0.2 2024-08-11 02:08:25,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=861450.0, ans=0.0 2024-08-11 02:08:31,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=861450.0, ans=0.04949747468305833 2024-08-11 02:08:54,115 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13700, loss[loss=0.1209, beats_loss=0.01066, ecapa_loss=0.0002252, whisper_loss=0.108, over 23566.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01158, ecapa_loss=0.0002187, whisper_loss=0.09371, over 3864010.01 frames. ], batch size: 93, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:09:04,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=861650.0, ans=0.02 2024-08-11 02:09:04,958 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2024-08-11 02:09:20,864 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 21 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 02:09:40,235 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 02:09:41,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=861950.0, ans=0.0 2024-08-11 02:09:41,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=861950.0, ans=10.0 2024-08-11 02:09:43,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=861950.0, ans=15.0 2024-08-11 02:09:44,016 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.742e+01 3.072e+01 3.573e+01 1.415e+02, threshold=6.145e+01, percent-clipped=1.0 2024-08-11 02:09:59,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=862050.0, ans=0.2 2024-08-11 02:10:00,932 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 02:10:05,030 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13750, loss[loss=0.07178, beats_loss=0.01503, ecapa_loss=0.0001857, whisper_loss=0.05489, over 16307.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01156, ecapa_loss=0.0002182, whisper_loss=0.09476, over 3894415.34 frames. ], batch size: 68, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:11:14,646 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13800, loss[loss=0.09153, beats_loss=0.01075, ecapa_loss=0.0002673, whisper_loss=0.0781, over 16769.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01147, ecapa_loss=0.0002186, whisper_loss=0.09467, over 3882988.41 frames. ], batch size: 72, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:11:14,829 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-11 02:11:24,851 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 16 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 02:11:36,566 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 16 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-11 02:11:41,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=862750.0, ans=0.1 2024-08-11 02:11:46,395 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 35 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 02:11:49,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=862850.0, ans=0.125 2024-08-11 02:11:56,463 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.48 vs. limit=10.0 2024-08-11 02:11:58,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=862950.0, ans=0.1 2024-08-11 02:12:04,974 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.614e+01 2.961e+01 3.435e+01 1.383e+02, threshold=5.922e+01, percent-clipped=1.0 2024-08-11 02:12:22,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=863050.0, ans=0.125 2024-08-11 02:12:23,266 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.37 vs. limit=22.5 2024-08-11 02:12:26,436 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13850, loss[loss=0.1033, beats_loss=0.01032, ecapa_loss=0.0002054, whisper_loss=0.09088, over 23012.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01136, ecapa_loss=0.000218, whisper_loss=0.09573, over 3888286.43 frames. ], batch size: 91, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:12:49,343 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-11 02:12:52,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=863250.0, ans=0.125 2024-08-11 02:13:00,652 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 02:13:12,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=863450.0, ans=0.125 2024-08-11 02:13:15,444 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 02:13:20,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=863450.0, ans=0.0 2024-08-11 02:13:36,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=863650.0, ans=0.125 2024-08-11 02:13:36,766 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13900, loss[loss=0.1079, beats_loss=0.01197, ecapa_loss=0.0002143, whisper_loss=0.09382, over 15034.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01144, ecapa_loss=0.0002171, whisper_loss=0.09579, over 3904871.84 frames. ], batch size: 59, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:13:47,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=863650.0, ans=0.125 2024-08-11 02:14:04,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=863850.0, ans=0.125 2024-08-11 02:14:09,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=863850.0, ans=0.125 2024-08-11 02:14:19,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=863950.0, ans=0.1 2024-08-11 02:14:19,680 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2024-08-11 02:14:23,045 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+01 2.760e+01 3.035e+01 3.739e+01 6.215e+01, threshold=6.069e+01, percent-clipped=1.0 2024-08-11 02:14:23,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=863950.0, ans=0.125 2024-08-11 02:14:31,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=864050.0, ans=0.125 2024-08-11 02:14:38,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=864050.0, ans=0.0 2024-08-11 02:14:42,274 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 13950, loss[loss=0.08974, beats_loss=0.01314, ecapa_loss=0.0001739, whisper_loss=0.07486, over 17687.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01147, ecapa_loss=0.0002158, whisper_loss=0.09533, over 3878474.74 frames. ], batch size: 69, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:14:45,022 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 02:14:46,567 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.867e+05 2024-08-11 02:14:51,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=864150.0, ans=0.0 2024-08-11 02:14:53,964 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 02:15:07,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=864350.0, ans=0.1 2024-08-11 02:15:12,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=864350.0, ans=0.2 2024-08-11 02:15:13,463 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 02:15:22,818 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 38 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 02:15:34,038 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.45 vs. limit=15.0 2024-08-11 02:15:38,374 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 02:15:47,375 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 14000, loss[loss=0.09955, beats_loss=0.01244, ecapa_loss=0.0001882, whisper_loss=0.08523, over 13297.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01137, ecapa_loss=0.000214, whisper_loss=0.09586, over 3890036.45 frames. ], batch size: 53, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:15:51,595 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-11 02:15:52,957 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 02:15:58,312 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 02:16:17,891 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 27 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 02:16:20,417 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-11 02:16:26,166 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2024-08-11 02:16:31,724 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.23 vs. limit=15.0 2024-08-11 02:16:32,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=864950.0, ans=0.125 2024-08-11 02:16:33,222 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.879e+01 3.227e+01 3.709e+01 6.302e+01, threshold=6.454e+01, percent-clipped=1.0 2024-08-11 02:16:45,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=865050.0, ans=0.1 2024-08-11 02:16:52,847 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 14050, loss[loss=0.11, beats_loss=0.01104, ecapa_loss=0.0002383, whisper_loss=0.09653, over 16550.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01137, ecapa_loss=0.0002149, whisper_loss=0.09602, over 3857698.85 frames. ], batch size: 69, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:16:57,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=865150.0, ans=0.09899494936611666 2024-08-11 02:17:08,037 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-11 02:17:24,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=865350.0, ans=0.0 2024-08-11 02:17:32,562 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2024-08-11 02:17:46,483 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-11 02:17:51,050 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2024-08-11 02:17:57,950 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 14100, loss[loss=0.1169, beats_loss=0.009538, ecapa_loss=0.0003178, whisper_loss=0.1042, over 15056.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01137, ecapa_loss=0.0002163, whisper_loss=0.09595, over 3838183.92 frames. ], batch size: 64, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:18:08,830 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 02:18:18,482 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 02:18:23,956 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-11 02:18:28,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=865850.0, ans=0.1 2024-08-11 02:18:28,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=865850.0, ans=0.1 2024-08-11 02:18:37,363 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 02:18:41,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=865950.0, ans=0.125 2024-08-11 02:18:44,991 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.713e+01 2.992e+01 3.543e+01 5.369e+01, threshold=5.983e+01, percent-clipped=0.0 2024-08-11 02:18:46,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=865950.0, ans=0.0 2024-08-11 02:18:53,166 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 02:19:04,983 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 14150, loss[loss=0.1239, beats_loss=0.008701, ecapa_loss=0.0002118, whisper_loss=0.1131, over 20730.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01139, ecapa_loss=0.0002142, whisper_loss=0.09602, over 3822211.29 frames. ], batch size: 79, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:19:22,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=866250.0, ans=0.0 2024-08-11 02:19:30,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=866350.0, ans=0.125 2024-08-11 02:19:53,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=866450.0, ans=0.125 2024-08-11 02:20:10,532 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 14200, loss[loss=0.1165, beats_loss=0.01166, ecapa_loss=0.000171, whisper_loss=0.1031, over 19921.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01138, ecapa_loss=0.0002146, whisper_loss=0.09586, over 3842262.37 frames. ], batch size: 75, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:20:13,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=866650.0, ans=0.125 2024-08-11 02:20:14,909 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=8.050e-01 2024-08-11 02:20:25,297 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 02:20:34,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=866750.0, ans=0.2 2024-08-11 02:20:36,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=866850.0, ans=0.125 2024-08-11 02:20:40,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=866850.0, ans=0.1 2024-08-11 02:20:41,253 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 7 from Vox, 31 fro AS 2024-08-11 02:20:41,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=866850.0, ans=0.07 2024-08-11 02:20:44,226 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 02:20:57,900 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.819e+01 3.173e+01 3.823e+01 7.553e+01, threshold=6.347e+01, percent-clipped=1.0 2024-08-11 02:21:15,449 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-11 02:21:19,256 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 14250, loss[loss=0.09847, beats_loss=0.01358, ecapa_loss=0.0002153, whisper_loss=0.08273, over 18969.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01142, ecapa_loss=0.0002131, whisper_loss=0.09594, over 3872314.28 frames. ], batch size: 77, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:21:31,921 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.76 vs. limit=22.5 2024-08-11 02:21:40,899 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=8.0 2024-08-11 02:21:50,830 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 02:21:58,470 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 02:21:59,798 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 02:22:06,498 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 02:22:14,459 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 02:22:15,991 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 02:22:26,976 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 14300, loss[loss=0.1189, beats_loss=0.009858, ecapa_loss=0.0001982, whisper_loss=0.1071, over 17503.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01157, ecapa_loss=0.000212, whisper_loss=0.09532, over 3851988.15 frames. ], batch size: 68, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:22:28,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=867650.0, ans=0.125 2024-08-11 02:22:29,781 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 02:22:30,526 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2024-08-11 02:22:32,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=867650.0, ans=0.025 2024-08-11 02:22:47,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=867750.0, ans=0.125 2024-08-11 02:22:58,708 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2024-08-11 02:23:11,594 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.03 vs. limit=12.0 2024-08-11 02:23:11,976 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.633e+01 2.947e+01 3.319e+01 6.322e+01, threshold=5.893e+01, percent-clipped=0.0 2024-08-11 02:23:15,963 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 02:23:18,898 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.702e+05 2024-08-11 02:23:31,133 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 14350, loss[loss=0.1225, beats_loss=0.009046, ecapa_loss=0.0002551, whisper_loss=0.1109, over 21085.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01158, ecapa_loss=0.0002129, whisper_loss=0.09464, over 3844927.20 frames. ], batch size: 84, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:23:34,934 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.70 vs. limit=22.5 2024-08-11 02:23:41,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=868150.0, ans=0.125 2024-08-11 02:23:46,076 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 21 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-11 02:23:48,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=868250.0, ans=15.0 2024-08-11 02:24:06,852 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 02:24:09,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=868450.0, ans=0.0 2024-08-11 02:24:13,062 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 26 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-11 02:24:18,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=868450.0, ans=0.1 2024-08-11 02:24:31,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=868550.0, ans=0.2 2024-08-11 02:24:35,720 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 14400, loss[loss=0.1094, beats_loss=0.01028, ecapa_loss=0.000266, whisper_loss=0.09649, over 16256.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01151, ecapa_loss=0.0002143, whisper_loss=0.09564, over 3889467.85 frames. ], batch size: 66, lr: 9.99e-03, grad_scale: 281474976710656.0 2024-08-11 02:24:42,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=868650.0, ans=0.125 2024-08-11 02:24:55,418 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-11 02:24:58,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=868750.0, ans=0.125 2024-08-11 02:25:03,553 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 02:25:20,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=868950.0, ans=0.125 2024-08-11 02:25:21,371 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.691e+01 3.158e+01 3.511e+01 8.025e+01, threshold=6.317e+01, percent-clipped=1.0 2024-08-11 02:25:23,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=868950.0, ans=0.2 2024-08-11 02:25:26,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=869050.0, ans=0.125 2024-08-11 02:25:28,932 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 02:25:39,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=869150.0, ans=0.0 2024-08-11 02:25:40,711 INFO [train_multi_KD3.py:1116] (2/4) Epoch 6, batch 14450, loss[loss=0.1032, beats_loss=0.01396, ecapa_loss=0.0001754, whisper_loss=0.0875, over 21404.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01164, ecapa_loss=0.0002134, whisper_loss=0.09416, over 3910214.78 frames. ], batch size: 88, lr: 9.99e-03, grad_scale: 281474976710656.0 2024-08-11 02:25:44,792 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 02:25:48,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=869150.0, ans=0.125 2024-08-11 02:25:56,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=869250.0, ans=0.5 2024-08-11 02:25:59,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=869250.0, ans=0.1 2024-08-11 02:26:06,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=869350.0, ans=0.125 2024-08-11 02:26:09,840 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.417e-02 2024-08-11 02:26:15,591 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.81 vs. limit=10.0 2024-08-11 02:26:19,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=869450.0, ans=0.0 2024-08-11 02:26:21,589 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2024-08-11 02:26:33,043 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 02:26:33,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=869550.0, ans=0.0 2024-08-11 02:27:16,264 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 0, loss[loss=0.0876, beats_loss=0.01165, ecapa_loss=0.0002198, whisper_loss=0.07375, over 15762.00 frames. ], tot_loss[loss=0.0876, beats_loss=0.01165, ecapa_loss=0.0002198, whisper_loss=0.07375, over 15762.00 frames. ], batch size: 62, lr: 9.36e-03, grad_scale: 281474976710656.0 2024-08-11 02:27:16,265 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 02:28:00,251 INFO [train_multi_KD3.py:1149] (2/4) Epoch 7, validation on ASR_libri: loss=0.2587, beats_loss=0, ecapa_loss=0.0006864, whisper_loss=0.2518, over 922467.00 frames. 2024-08-11 02:28:18,644 INFO [train_multi_KD3.py:1149] (2/4) Epoch 7, validation on SV_voxceleb1: loss=0.00579, beats_loss=0, ecapa_loss=0.000579, whisper_loss=0, over 939242.00 frames. 2024-08-11 02:29:55,830 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.7490, 1.3884, 2.0022, 2.1423], device='cuda:2') 2024-08-11 02:30:27,743 INFO [train_multi_KD3.py:1149] (2/4) Epoch 7, validation on AT_audioset: loss=0.02579, beats_loss=0.02579, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 02:30:27,746 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 02:30:28,508 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.98 vs. limit=22.5 2024-08-11 02:30:33,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=869590.0, ans=0.125 2024-08-11 02:31:05,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=869690.0, ans=0.1 2024-08-11 02:31:22,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=869790.0, ans=0.125 2024-08-11 02:31:29,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=869790.0, ans=0.1 2024-08-11 02:32:35,944 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.255e+01 2.976e+01 3.314e+01 3.996e+01 6.220e+01, threshold=6.628e+01, percent-clipped=0.0 2024-08-11 02:32:54,350 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.605e-01 2024-08-11 02:32:56,101 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 02:33:11,169 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 50, loss[loss=0.09571, beats_loss=0.01275, ecapa_loss=0.0002413, whisper_loss=0.08055, over 21734.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01105, ecapa_loss=0.0002241, whisper_loss=0.09367, over 883515.29 frames. ], batch size: 88, lr: 9.36e-03, grad_scale: 281474976710656.0 2024-08-11 02:33:14,469 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 02:34:20,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=870190.0, ans=0.125 2024-08-11 02:36:03,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=870490.0, ans=0.0 2024-08-11 02:36:18,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 100, loss[loss=0.1098, beats_loss=0.009516, ecapa_loss=0.0001986, whisper_loss=0.09827, over 14666.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01106, ecapa_loss=0.0002211, whisper_loss=0.09295, over 1514985.27 frames. ], batch size: 55, lr: 9.36e-03, grad_scale: 281474976710656.0 2024-08-11 02:36:26,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=870590.0, ans=0.125 2024-08-11 02:37:09,711 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-11 02:37:26,579 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.25 vs. limit=22.5 2024-08-11 02:37:57,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=870790.0, ans=0.5 2024-08-11 02:38:15,009 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 29 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 02:38:18,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=870890.0, ans=0.04949747468305833 2024-08-11 02:38:32,352 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.551e+01 3.124e+01 3.380e+01 3.805e+01 6.032e+01, threshold=6.760e+01, percent-clipped=0.0 2024-08-11 02:38:41,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=870990.0, ans=0.125 2024-08-11 02:38:49,759 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 150, loss[loss=0.1237, beats_loss=0.01094, ecapa_loss=0.0001806, whisper_loss=0.1109, over 17643.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01083, ecapa_loss=0.0002146, whisper_loss=0.09471, over 2040642.07 frames. ], batch size: 66, lr: 9.36e-03, grad_scale: 281474976710656.0 2024-08-11 02:38:56,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=871090.0, ans=0.0 2024-08-11 02:39:09,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=871190.0, ans=0.0 2024-08-11 02:39:09,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=871190.0, ans=0.1 2024-08-11 02:39:47,537 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.05 vs. limit=10.0 2024-08-11 02:39:54,370 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 02:39:57,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=871390.0, ans=0.0 2024-08-11 02:39:59,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=871490.0, ans=0.0 2024-08-11 02:40:13,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=871490.0, ans=0.0 2024-08-11 02:40:13,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=871490.0, ans=0.1 2024-08-11 02:40:16,135 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 200, loss[loss=0.1161, beats_loss=0.01361, ecapa_loss=0.0001542, whisper_loss=0.1009, over 22259.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01091, ecapa_loss=0.0002123, whisper_loss=0.09494, over 2420152.38 frames. ], batch size: 85, lr: 9.35e-03, grad_scale: 281474976710656.0 2024-08-11 02:40:29,994 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 02:40:36,557 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 20 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-11 02:40:37,454 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.64 vs. limit=10.0 2024-08-11 02:40:54,355 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 17 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 02:41:07,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=871890.0, ans=0.95 2024-08-11 02:41:21,797 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.256e+01 2.791e+01 3.109e+01 3.398e+01 1.022e+02, threshold=6.218e+01, percent-clipped=1.0 2024-08-11 02:41:21,990 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 02:41:35,594 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 250, loss[loss=0.1149, beats_loss=0.01083, ecapa_loss=0.0002623, whisper_loss=0.1015, over 21915.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01124, ecapa_loss=0.0002133, whisper_loss=0.09356, over 2747238.93 frames. ], batch size: 88, lr: 9.35e-03, grad_scale: 281474976710656.0 2024-08-11 02:41:50,315 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 02:42:23,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=872290.0, ans=0.0 2024-08-11 02:42:30,975 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 02:42:42,936 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-11 02:42:57,953 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 300, loss[loss=0.09406, beats_loss=0.01271, ecapa_loss=0.0001915, whisper_loss=0.07943, over 22564.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01126, ecapa_loss=0.0002128, whisper_loss=0.0926, over 2957532.46 frames. ], batch size: 90, lr: 9.35e-03, grad_scale: 281474976710656.0 2024-08-11 02:43:05,755 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 02:43:10,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=872590.0, ans=0.125 2024-08-11 02:43:12,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=872690.0, ans=0.125 2024-08-11 02:43:23,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=872690.0, ans=0.04949747468305833 2024-08-11 02:43:47,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=872890.0, ans=0.1 2024-08-11 02:44:02,255 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.643e+01 2.910e+01 3.334e+01 5.693e+01, threshold=5.820e+01, percent-clipped=0.0 2024-08-11 02:44:15,858 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 350, loss[loss=0.09308, beats_loss=0.01365, ecapa_loss=0.0002454, whisper_loss=0.07698, over 18475.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01121, ecapa_loss=0.0002121, whisper_loss=0.09309, over 3149895.37 frames. ], batch size: 79, lr: 9.34e-03, grad_scale: 281474976710656.0 2024-08-11 02:44:24,168 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.588e-01 2024-08-11 02:44:25,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=873090.0, ans=10.0 2024-08-11 02:44:30,186 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 02:44:30,572 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.39 vs. limit=22.5 2024-08-11 02:44:32,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=873190.0, ans=0.1 2024-08-11 02:44:40,819 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 02:45:27,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=873490.0, ans=0.0 2024-08-11 02:45:27,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=873490.0, ans=0.2 2024-08-11 02:45:31,549 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 02:45:32,680 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 400, loss[loss=0.09206, beats_loss=0.0129, ecapa_loss=0.0002072, whisper_loss=0.07709, over 20260.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.0113, ecapa_loss=0.0002104, whisper_loss=0.09338, over 3311118.13 frames. ], batch size: 87, lr: 9.34e-03, grad_scale: 281474976710656.0 2024-08-11 02:45:37,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=873590.0, ans=0.2 2024-08-11 02:45:38,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=873590.0, ans=0.1 2024-08-11 02:45:48,847 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 02:45:57,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=873690.0, ans=0.0 2024-08-11 02:46:04,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=873790.0, ans=0.125 2024-08-11 02:46:23,191 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-11 02:46:24,790 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.593e+05 2024-08-11 02:46:35,423 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.580e+01 2.895e+01 3.398e+01 1.445e+02, threshold=5.790e+01, percent-clipped=1.0 2024-08-11 02:46:35,615 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 02:46:39,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=873990.0, ans=0.125 2024-08-11 02:46:39,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=873990.0, ans=0.125 2024-08-11 02:46:48,704 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 450, loss[loss=0.1297, beats_loss=0.008694, ecapa_loss=0.0002084, whisper_loss=0.119, over 17247.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01129, ecapa_loss=0.0002089, whisper_loss=0.09371, over 3424706.27 frames. ], batch size: 66, lr: 9.34e-03, grad_scale: 281474976710656.0 2024-08-11 02:46:52,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=874090.0, ans=0.125 2024-08-11 02:47:01,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=874090.0, ans=0.5 2024-08-11 02:47:08,820 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.19 vs. limit=15.0 2024-08-11 02:47:30,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=874290.0, ans=0.0 2024-08-11 02:47:32,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=874390.0, ans=0.125 2024-08-11 02:47:43,407 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 02:47:47,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=874490.0, ans=0.0 2024-08-11 02:47:49,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=874490.0, ans=0.125 2024-08-11 02:48:02,671 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 500, loss[loss=0.1028, beats_loss=0.01172, ecapa_loss=0.0001881, whisper_loss=0.08919, over 22905.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01129, ecapa_loss=0.000208, whisper_loss=0.09421, over 3548263.62 frames. ], batch size: 90, lr: 9.34e-03, grad_scale: 281474976710656.0 2024-08-11 02:48:04,742 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.39 vs. limit=6.0 2024-08-11 02:48:24,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=874690.0, ans=0.125 2024-08-11 02:48:31,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=874790.0, ans=0.0 2024-08-11 02:48:35,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=874790.0, ans=0.125 2024-08-11 02:48:41,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.06 vs. limit=12.0 2024-08-11 02:48:58,382 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.274e+01 2.783e+01 3.369e+01 3.762e+01 6.753e+01, threshold=6.739e+01, percent-clipped=3.0 2024-08-11 02:49:03,819 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 02:49:10,328 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 550, loss[loss=0.1317, beats_loss=0.008524, ecapa_loss=0.0002195, whisper_loss=0.121, over 14774.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01132, ecapa_loss=0.0002064, whisper_loss=0.09415, over 3594437.26 frames. ], batch size: 56, lr: 9.33e-03, grad_scale: 281474976710656.0 2024-08-11 02:49:13,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=875090.0, ans=0.5 2024-08-11 02:49:14,842 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 33 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 02:49:21,143 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-11 02:49:31,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=875190.0, ans=0.2 2024-08-11 02:49:38,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=875290.0, ans=0.2 2024-08-11 02:49:41,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=875290.0, ans=0.2 2024-08-11 02:50:15,324 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 600, loss[loss=0.08284, beats_loss=0.01363, ecapa_loss=0.0001989, whisper_loss=0.06722, over 18601.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01138, ecapa_loss=0.0002044, whisper_loss=0.09343, over 3651372.91 frames. ], batch size: 77, lr: 9.33e-03, grad_scale: 281474976710656.0 2024-08-11 02:50:18,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=875590.0, ans=0.125 2024-08-11 02:50:20,992 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 02:50:24,039 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.19 vs. limit=15.0 2024-08-11 02:50:27,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=875690.0, ans=0.1 2024-08-11 02:50:31,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=875690.0, ans=0.035 2024-08-11 02:50:50,840 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 02:50:51,068 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 02:50:53,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=875890.0, ans=0.1 2024-08-11 02:51:06,442 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 02:51:09,025 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.091e+01 2.703e+01 3.008e+01 3.347e+01 4.794e+01, threshold=6.016e+01, percent-clipped=0.0 2024-08-11 02:51:11,058 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=12.0 2024-08-11 02:51:17,917 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.36 vs. limit=12.0 2024-08-11 02:51:18,400 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-11 02:51:20,959 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 650, loss[loss=0.1056, beats_loss=0.009205, ecapa_loss=0.0002624, whisper_loss=0.09382, over 15389.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01132, ecapa_loss=0.0002047, whisper_loss=0.09368, over 3702909.90 frames. ], batch size: 63, lr: 9.33e-03, grad_scale: 281474976710656.0 2024-08-11 02:51:37,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=876190.0, ans=0.125 2024-08-11 02:51:45,726 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-11 02:51:58,205 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2024-08-11 02:52:12,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=876490.0, ans=0.125 2024-08-11 02:52:15,307 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.89 vs. limit=15.0 2024-08-11 02:52:26,199 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 700, loss[loss=0.08949, beats_loss=0.01284, ecapa_loss=0.000231, whisper_loss=0.07434, over 19213.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01144, ecapa_loss=0.0002048, whisper_loss=0.09266, over 3762611.66 frames. ], batch size: 81, lr: 9.33e-03, grad_scale: 281474976710656.0 2024-08-11 02:52:27,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=876590.0, ans=0.0 2024-08-11 02:52:39,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=876690.0, ans=0.0 2024-08-11 02:52:58,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=876790.0, ans=0.125 2024-08-11 02:53:05,584 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 02:53:06,899 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-11 02:53:09,413 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 02:53:09,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=876890.0, ans=0.0 2024-08-11 02:53:19,447 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.223e+01 2.856e+01 3.234e+01 3.790e+01 5.945e+01, threshold=6.469e+01, percent-clipped=0.0 2024-08-11 02:53:30,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=877090.0, ans=0.125 2024-08-11 02:53:31,291 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 750, loss[loss=0.1114, beats_loss=0.01093, ecapa_loss=0.0002578, whisper_loss=0.09785, over 22188.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01141, ecapa_loss=0.0002036, whisper_loss=0.09248, over 3751966.63 frames. ], batch size: 93, lr: 9.32e-03, grad_scale: 281474976710656.0 2024-08-11 02:53:51,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=877190.0, ans=0.125 2024-08-11 02:53:55,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=877190.0, ans=0.125 2024-08-11 02:54:24,734 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 02:54:31,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=877490.0, ans=0.2 2024-08-11 02:54:33,801 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 13 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 02:54:36,317 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 800, loss[loss=0.06148, beats_loss=0.01346, ecapa_loss=0.0001982, whisper_loss=0.04603, over 15884.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01146, ecapa_loss=0.0002023, whisper_loss=0.09238, over 3773038.41 frames. ], batch size: 64, lr: 9.32e-03, grad_scale: 281474976710656.0 2024-08-11 02:54:51,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=877690.0, ans=0.1 2024-08-11 02:54:55,542 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 28 from Vox, 20 fro AS 2024-08-11 02:54:57,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=877690.0, ans=0.0 2024-08-11 02:54:58,333 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-11 02:54:59,679 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-11 02:54:59,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=877690.0, ans=0.0 2024-08-11 02:55:12,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=877790.0, ans=0.0 2024-08-11 02:55:19,312 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.272e-01 2024-08-11 02:55:21,984 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 33 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-11 02:55:22,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=877890.0, ans=0.1 2024-08-11 02:55:26,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=877890.0, ans=0.125 2024-08-11 02:55:27,225 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-11 02:55:29,534 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.644e+01 2.972e+01 3.441e+01 7.984e+01, threshold=5.944e+01, percent-clipped=1.0 2024-08-11 02:55:34,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=877990.0, ans=0.1 2024-08-11 02:55:41,242 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 850, loss[loss=0.1107, beats_loss=0.0105, ecapa_loss=0.0002404, whisper_loss=0.09778, over 21354.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01144, ecapa_loss=0.0002025, whisper_loss=0.09163, over 3804049.92 frames. ], batch size: 88, lr: 9.32e-03, grad_scale: 281474976710656.0 2024-08-11 02:55:47,143 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2024-08-11 02:55:48,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=878090.0, ans=0.0 2024-08-11 02:55:48,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=878090.0, ans=0.125 2024-08-11 02:55:53,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=878190.0, ans=0.2 2024-08-11 02:55:59,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=878190.0, ans=0.0 2024-08-11 02:56:32,806 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-11 02:56:35,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=878490.0, ans=0.0 2024-08-11 02:56:39,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=878490.0, ans=0.0 2024-08-11 02:56:45,628 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 900, loss[loss=0.09933, beats_loss=0.01334, ecapa_loss=0.0001796, whisper_loss=0.08419, over 15326.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01141, ecapa_loss=0.0002011, whisper_loss=0.09217, over 3787899.24 frames. ], batch size: 60, lr: 9.32e-03, grad_scale: 281474976710656.0 2024-08-11 02:56:56,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=878590.0, ans=0.5 2024-08-11 02:56:57,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=878690.0, ans=0.2 2024-08-11 02:57:11,040 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 02:57:11,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=878790.0, ans=0.0 2024-08-11 02:57:12,805 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.88 vs. limit=12.0 2024-08-11 02:57:21,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=878790.0, ans=0.125 2024-08-11 02:57:30,486 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-11 02:57:38,809 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.614e+01 2.988e+01 3.449e+01 5.810e+01, threshold=5.976e+01, percent-clipped=0.0 2024-08-11 02:57:51,474 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 950, loss[loss=0.1006, beats_loss=0.00816, ecapa_loss=0.0001816, whisper_loss=0.09061, over 15319.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01147, ecapa_loss=0.0002006, whisper_loss=0.0921, over 3810774.69 frames. ], batch size: 57, lr: 9.31e-03, grad_scale: 281474976710656.0 2024-08-11 02:57:51,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=879090.0, ans=0.125 2024-08-11 02:58:05,526 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 02:58:19,998 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2024-08-11 02:58:28,996 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 02:58:40,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=879390.0, ans=0.2 2024-08-11 02:58:45,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=879490.0, ans=0.1 2024-08-11 02:58:53,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=879490.0, ans=0.0 2024-08-11 02:59:00,830 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1000, loss[loss=0.1009, beats_loss=0.01065, ecapa_loss=0.0002521, whisper_loss=0.08774, over 17153.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01147, ecapa_loss=0.0002002, whisper_loss=0.09221, over 3780761.24 frames. ], batch size: 72, lr: 9.31e-03, grad_scale: 281474976710656.0 2024-08-11 02:59:08,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=879590.0, ans=0.125 2024-08-11 02:59:12,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=879590.0, ans=0.125 2024-08-11 02:59:30,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=879790.0, ans=0.2 2024-08-11 02:59:41,823 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 02:59:55,413 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-11 03:00:01,724 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.796e+01 3.092e+01 3.418e+01 4.355e+01, threshold=6.184e+01, percent-clipped=0.0 2024-08-11 03:00:06,885 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 33 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-11 03:00:12,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=880090.0, ans=0.125 2024-08-11 03:00:13,915 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1050, loss[loss=0.09368, beats_loss=0.008752, ecapa_loss=0.0002272, whisper_loss=0.08266, over 14268.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01146, ecapa_loss=0.0002013, whisper_loss=0.0927, over 3810144.46 frames. ], batch size: 57, lr: 9.31e-03, grad_scale: 562949953421312.0 2024-08-11 03:00:14,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=880090.0, ans=0.2 2024-08-11 03:00:14,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=880090.0, ans=0.09899494936611666 2024-08-11 03:00:37,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=880190.0, ans=0.0 2024-08-11 03:00:38,548 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 03:00:48,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=880290.0, ans=0.125 2024-08-11 03:00:59,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=880390.0, ans=0.1 2024-08-11 03:01:02,538 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 03:01:08,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=880390.0, ans=0.125 2024-08-11 03:01:08,618 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-11 03:01:10,138 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.89 vs. limit=15.0 2024-08-11 03:01:13,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=880490.0, ans=0.0 2024-08-11 03:01:27,253 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1100, loss[loss=0.09782, beats_loss=0.01049, ecapa_loss=0.0002756, whisper_loss=0.08457, over 21555.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01143, ecapa_loss=0.0002008, whisper_loss=0.09263, over 3802100.83 frames. ], batch size: 93, lr: 9.31e-03, grad_scale: 562949953421312.0 2024-08-11 03:01:28,716 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 03:01:38,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=880590.0, ans=0.1 2024-08-11 03:02:01,987 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 03:02:12,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=880890.0, ans=0.05 2024-08-11 03:02:23,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=880890.0, ans=0.125 2024-08-11 03:02:25,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=880990.0, ans=0.125 2024-08-11 03:02:27,957 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.986e+01 2.648e+01 3.166e+01 3.461e+01 5.758e+01, threshold=6.333e+01, percent-clipped=0.0 2024-08-11 03:02:29,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=880990.0, ans=0.125 2024-08-11 03:02:31,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=880990.0, ans=0.2 2024-08-11 03:02:40,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=881090.0, ans=0.125 2024-08-11 03:02:40,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1150, loss[loss=0.1199, beats_loss=0.008854, ecapa_loss=0.0002128, whisper_loss=0.1089, over 23206.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01146, ecapa_loss=0.0002, whisper_loss=0.09248, over 3806431.89 frames. ], batch size: 92, lr: 9.30e-03, grad_scale: 562949953421312.0 2024-08-11 03:02:53,634 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-11 03:02:59,203 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 03:03:14,610 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.29 vs. limit=15.0 2024-08-11 03:03:15,444 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-11 03:03:21,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=881290.0, ans=0.125 2024-08-11 03:03:34,082 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-11 03:03:34,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=881390.0, ans=0.125 2024-08-11 03:03:38,776 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 17 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 03:03:42,431 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.95 vs. limit=12.0 2024-08-11 03:03:50,331 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 03:03:52,790 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1200, loss[loss=0.08671, beats_loss=0.01499, ecapa_loss=0.0002005, whisper_loss=0.06972, over 21559.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01145, ecapa_loss=0.0001988, whisper_loss=0.09232, over 3788818.53 frames. ], batch size: 88, lr: 9.30e-03, grad_scale: 562949953421312.0 2024-08-11 03:04:04,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=881590.0, ans=0.0 2024-08-11 03:04:13,939 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2024-08-11 03:04:48,271 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 37 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 03:04:48,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=881890.0, ans=0.125 2024-08-11 03:04:52,465 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.507e+01 2.887e+01 3.348e+01 4.586e+01, threshold=5.774e+01, percent-clipped=0.0 2024-08-11 03:04:57,459 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 19 from LS+wenet, 33 from Vox, 38 fro AS 2024-08-11 03:05:03,794 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.57 vs. limit=15.0 2024-08-11 03:05:04,712 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 03:05:05,667 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1250, loss[loss=0.1108, beats_loss=0.0109, ecapa_loss=0.0002088, whisper_loss=0.09777, over 21083.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01136, ecapa_loss=0.0002005, whisper_loss=0.09263, over 3803517.72 frames. ], batch size: 87, lr: 9.30e-03, grad_scale: 562949953421312.0 2024-08-11 03:05:08,280 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.26 vs. limit=15.0 2024-08-11 03:05:10,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=882090.0, ans=0.0 2024-08-11 03:05:14,033 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.16 vs. limit=10.0 2024-08-11 03:05:29,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=882190.0, ans=0.125 2024-08-11 03:05:58,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=882390.0, ans=0.1 2024-08-11 03:06:00,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=882390.0, ans=0.125 2024-08-11 03:06:09,233 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.42 vs. limit=15.0 2024-08-11 03:06:20,277 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1300, loss[loss=0.1042, beats_loss=0.01282, ecapa_loss=0.000176, whisper_loss=0.08959, over 23626.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01137, ecapa_loss=0.0002002, whisper_loss=0.0924, over 3798830.58 frames. ], batch size: 92, lr: 9.29e-03, grad_scale: 562949953421312.0 2024-08-11 03:06:26,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=882590.0, ans=0.125 2024-08-11 03:06:32,050 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 03:06:45,098 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 03:07:13,618 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 03:07:20,723 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+01 2.642e+01 3.016e+01 3.566e+01 8.330e+01, threshold=6.031e+01, percent-clipped=1.0 2024-08-11 03:07:21,685 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.64 vs. limit=22.5 2024-08-11 03:07:30,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=882990.0, ans=0.2 2024-08-11 03:07:33,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=883090.0, ans=0.125 2024-08-11 03:07:34,485 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1350, loss[loss=0.117, beats_loss=0.00964, ecapa_loss=0.0002295, whisper_loss=0.1051, over 16230.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.0114, ecapa_loss=0.0002003, whisper_loss=0.09268, over 3812976.25 frames. ], batch size: 67, lr: 9.29e-03, grad_scale: 562949953421312.0 2024-08-11 03:07:39,192 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 20 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 03:08:02,865 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 03:08:09,316 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 03:08:12,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=883290.0, ans=0.05 2024-08-11 03:08:38,048 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.14 vs. limit=22.5 2024-08-11 03:08:48,086 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1400, loss[loss=0.107, beats_loss=0.01004, ecapa_loss=0.0002318, whisper_loss=0.09469, over 21617.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01138, ecapa_loss=0.0001995, whisper_loss=0.09246, over 3809398.61 frames. ], batch size: 88, lr: 9.29e-03, grad_scale: 562949953421312.0 2024-08-11 03:08:48,964 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2024-08-11 03:08:50,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=883590.0, ans=0.125 2024-08-11 03:08:58,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=883590.0, ans=0.0 2024-08-11 03:09:07,052 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 03:09:26,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=883790.0, ans=0.025 2024-08-11 03:09:49,710 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.657e+01 3.071e+01 3.496e+01 6.029e+01, threshold=6.143e+01, percent-clipped=0.0 2024-08-11 03:09:51,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=883990.0, ans=0.125 2024-08-11 03:09:54,458 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-11 03:10:36,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=884090.0, ans=0.125 2024-08-11 03:10:37,352 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1450, loss[loss=0.1115, beats_loss=0.009143, ecapa_loss=0.0001788, whisper_loss=0.1006, over 14269.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01136, ecapa_loss=0.0002004, whisper_loss=0.09307, over 3825984.38 frames. ], batch size: 54, lr: 9.29e-03, grad_scale: 562949953421312.0 2024-08-11 03:10:56,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=884190.0, ans=0.0 2024-08-11 03:11:38,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=884490.0, ans=0.125 2024-08-11 03:11:47,240 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 03:11:51,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=884590.0, ans=0.0 2024-08-11 03:11:53,064 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1500, loss[loss=0.1256, beats_loss=0.008711, ecapa_loss=0.0002389, whisper_loss=0.1145, over 14512.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01142, ecapa_loss=0.0001986, whisper_loss=0.09283, over 3812491.30 frames. ], batch size: 58, lr: 9.28e-03, grad_scale: 562949953421312.0 2024-08-11 03:11:56,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=884590.0, ans=0.1 2024-08-11 03:11:57,441 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-11 03:12:03,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=884590.0, ans=0.125 2024-08-11 03:12:03,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=884590.0, ans=0.0 2024-08-11 03:12:25,333 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-11 03:12:37,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=884890.0, ans=0.0 2024-08-11 03:12:48,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=884890.0, ans=0.0 2024-08-11 03:12:53,737 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.731e+01 3.107e+01 3.593e+01 6.683e+01, threshold=6.214e+01, percent-clipped=1.0 2024-08-11 03:13:05,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=884990.0, ans=0.2 2024-08-11 03:13:07,794 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1550, loss[loss=0.1248, beats_loss=0.009268, ecapa_loss=0.0002153, whisper_loss=0.1134, over 19259.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01145, ecapa_loss=0.0001988, whisper_loss=0.09265, over 3812707.15 frames. ], batch size: 72, lr: 9.28e-03, grad_scale: 562949953421312.0 2024-08-11 03:13:08,215 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 03:13:10,880 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-11 03:13:17,062 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.16 vs. limit=22.5 2024-08-11 03:13:24,676 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.29 vs. limit=15.0 2024-08-11 03:13:30,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=885190.0, ans=0.125 2024-08-11 03:13:43,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=885290.0, ans=0.125 2024-08-11 03:13:50,494 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.586e-02 2024-08-11 03:14:14,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=885490.0, ans=0.5 2024-08-11 03:14:17,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=885490.0, ans=0.0 2024-08-11 03:14:19,427 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.38 vs. limit=22.5 2024-08-11 03:14:21,457 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1600, loss[loss=0.1198, beats_loss=0.01103, ecapa_loss=0.0001997, whisper_loss=0.1068, over 16676.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01134, ecapa_loss=0.0001987, whisper_loss=0.0932, over 3821284.30 frames. ], batch size: 63, lr: 9.28e-03, grad_scale: 562949953421312.0 2024-08-11 03:14:23,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=885590.0, ans=0.1 2024-08-11 03:14:32,062 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-11 03:14:41,603 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 03:14:44,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=885690.0, ans=0.125 2024-08-11 03:14:45,684 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 03:14:47,915 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.66 vs. limit=15.0 2024-08-11 03:14:53,443 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 03:15:10,651 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 03:15:13,024 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.75 vs. limit=22.5 2024-08-11 03:15:21,787 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.608e+01 2.973e+01 3.361e+01 6.559e+01, threshold=5.946e+01, percent-clipped=1.0 2024-08-11 03:15:34,140 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1650, loss[loss=0.1119, beats_loss=0.01246, ecapa_loss=0.0001592, whisper_loss=0.09781, over 16523.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01133, ecapa_loss=0.0001978, whisper_loss=0.09335, over 3842497.52 frames. ], batch size: 63, lr: 9.28e-03, grad_scale: 562949953421312.0 2024-08-11 03:15:37,446 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 12 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 03:15:47,854 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 03:15:54,771 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 12 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 03:15:55,314 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.19 vs. limit=22.5 2024-08-11 03:15:56,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=886190.0, ans=0.0 2024-08-11 03:16:11,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=886290.0, ans=0.5 2024-08-11 03:16:16,694 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-11 03:16:17,427 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=22.5 2024-08-11 03:16:34,455 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 31 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 03:16:44,945 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1700, loss[loss=0.09475, beats_loss=0.01306, ecapa_loss=0.0001409, whisper_loss=0.08028, over 17154.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01131, ecapa_loss=0.0001974, whisper_loss=0.09349, over 3838479.96 frames. ], batch size: 62, lr: 9.27e-03, grad_scale: 562949953421312.0 2024-08-11 03:17:03,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=886690.0, ans=0.125 2024-08-11 03:17:03,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=886690.0, ans=0.1 2024-08-11 03:17:11,231 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 19 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 03:17:18,142 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 03:17:19,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=886790.0, ans=0.1 2024-08-11 03:17:23,638 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 03:17:28,006 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-11 03:17:42,206 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.697e+01 3.081e+01 3.373e+01 4.997e+01, threshold=6.161e+01, percent-clipped=0.0 2024-08-11 03:17:55,103 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1750, loss[loss=0.08321, beats_loss=0.0135, ecapa_loss=0.0002123, whisper_loss=0.06759, over 20681.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01134, ecapa_loss=0.0001969, whisper_loss=0.09342, over 3852980.00 frames. ], batch size: 87, lr: 9.27e-03, grad_scale: 562949953421312.0 2024-08-11 03:18:00,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=887090.0, ans=0.125 2024-08-11 03:18:06,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=887090.0, ans=0.2 2024-08-11 03:18:14,975 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 03:18:17,957 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 03:18:43,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=887390.0, ans=0.1 2024-08-11 03:18:46,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=887390.0, ans=0.125 2024-08-11 03:18:49,779 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 03:19:03,014 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1800, loss[loss=0.09947, beats_loss=0.01254, ecapa_loss=0.000215, whisper_loss=0.08478, over 15215.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01129, ecapa_loss=0.0001972, whisper_loss=0.09381, over 3852791.35 frames. ], batch size: 61, lr: 9.27e-03, grad_scale: 562949953421312.0 2024-08-11 03:19:04,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=887590.0, ans=0.125 2024-08-11 03:19:07,626 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.01 vs. limit=22.5 2024-08-11 03:19:18,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=887690.0, ans=0.0 2024-08-11 03:19:19,669 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 03:19:22,746 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=15.0 2024-08-11 03:19:27,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=22.5 2024-08-11 03:19:28,474 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.08 vs. limit=15.0 2024-08-11 03:19:33,943 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.468e+05 2024-08-11 03:19:36,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=887790.0, ans=0.1 2024-08-11 03:19:41,798 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 03:19:42,958 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 03:19:55,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=887890.0, ans=0.1 2024-08-11 03:20:00,247 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.601e+01 2.973e+01 3.471e+01 4.949e+01, threshold=5.947e+01, percent-clipped=0.0 2024-08-11 03:20:13,290 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1850, loss[loss=0.1286, beats_loss=0.009955, ecapa_loss=0.000161, whisper_loss=0.117, over 21734.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01127, ecapa_loss=0.0001983, whisper_loss=0.09378, over 3834579.54 frames. ], batch size: 77, lr: 9.27e-03, grad_scale: 562949953421312.0 2024-08-11 03:20:33,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=888190.0, ans=10.0 2024-08-11 03:20:53,596 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 03:21:01,745 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 03:21:14,587 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.05 vs. limit=22.5 2024-08-11 03:21:22,138 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1900, loss[loss=0.115, beats_loss=0.01477, ecapa_loss=0.0001486, whisper_loss=0.09876, over 19606.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01129, ecapa_loss=0.0002013, whisper_loss=0.09304, over 3813952.36 frames. ], batch size: 74, lr: 9.26e-03, grad_scale: 562949953421312.0 2024-08-11 03:21:35,685 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 03:21:39,623 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 35 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 03:21:42,130 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 11 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 03:21:46,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=888690.0, ans=0.1 2024-08-11 03:21:51,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=888790.0, ans=0.125 2024-08-11 03:21:59,769 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 03:22:01,056 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-11 03:22:13,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=888890.0, ans=0.025 2024-08-11 03:22:16,936 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.166e+01 2.590e+01 3.002e+01 3.327e+01 6.064e+01, threshold=6.004e+01, percent-clipped=1.0 2024-08-11 03:22:19,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=888990.0, ans=0.0 2024-08-11 03:22:30,302 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 1950, loss[loss=0.1163, beats_loss=0.01182, ecapa_loss=0.0002566, whisper_loss=0.1019, over 23105.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0113, ecapa_loss=0.000204, whisper_loss=0.09308, over 3813306.54 frames. ], batch size: 94, lr: 9.26e-03, grad_scale: 562949953421312.0 2024-08-11 03:22:34,424 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 03:23:04,152 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 03:23:15,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=889390.0, ans=0.125 2024-08-11 03:23:19,336 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 27 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 03:23:24,496 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 03:23:27,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=889490.0, ans=0.125 2024-08-11 03:23:29,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=889490.0, ans=0.2 2024-08-11 03:23:38,604 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2000, loss[loss=0.08265, beats_loss=0.01416, ecapa_loss=0.0002249, whisper_loss=0.06624, over 22097.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01139, ecapa_loss=0.0002051, whisper_loss=0.0928, over 3795773.18 frames. ], batch size: 94, lr: 9.26e-03, grad_scale: 562949953421312.0 2024-08-11 03:23:42,585 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 03:23:47,763 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.21 vs. limit=15.0 2024-08-11 03:24:03,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=889690.0, ans=0.2 2024-08-11 03:24:12,757 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 03:24:22,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=889890.0, ans=0.125 2024-08-11 03:24:23,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=889890.0, ans=0.125 2024-08-11 03:24:34,351 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 2.753e+01 3.127e+01 3.595e+01 5.672e+01, threshold=6.254e+01, percent-clipped=0.0 2024-08-11 03:24:37,506 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.34 vs. limit=10.0 2024-08-11 03:24:47,624 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2050, loss[loss=0.09573, beats_loss=0.0136, ecapa_loss=0.0001938, whisper_loss=0.0802, over 20488.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01146, ecapa_loss=0.0002045, whisper_loss=0.09332, over 3828132.64 frames. ], batch size: 83, lr: 9.26e-03, grad_scale: 562949953421312.0 2024-08-11 03:25:05,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=890190.0, ans=0.0 2024-08-11 03:25:06,580 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=15.0 2024-08-11 03:25:09,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=890190.0, ans=0.125 2024-08-11 03:25:39,902 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.76 vs. limit=22.5 2024-08-11 03:25:59,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=890590.0, ans=0.125 2024-08-11 03:26:00,597 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2100, loss[loss=0.1047, beats_loss=0.01177, ecapa_loss=0.0002383, whisper_loss=0.09054, over 23135.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01152, ecapa_loss=0.0002049, whisper_loss=0.09249, over 3844766.11 frames. ], batch size: 94, lr: 9.25e-03, grad_scale: 562949953421312.0 2024-08-11 03:26:00,755 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-11 03:26:03,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=890590.0, ans=0.125 2024-08-11 03:26:06,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=890590.0, ans=0.2 2024-08-11 03:26:28,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=890690.0, ans=0.125 2024-08-11 03:26:52,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=890890.0, ans=0.125 2024-08-11 03:26:54,558 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 03:26:54,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=890890.0, ans=10.0 2024-08-11 03:26:57,191 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.973e+02 2024-08-11 03:27:01,092 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.655e+01 3.007e+01 3.449e+01 4.820e+01, threshold=6.014e+01, percent-clipped=0.0 2024-08-11 03:27:01,309 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 03:27:01,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=890990.0, ans=10.0 2024-08-11 03:27:07,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=890990.0, ans=0.0 2024-08-11 03:27:14,300 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2150, loss[loss=0.1005, beats_loss=0.01181, ecapa_loss=0.000208, whisper_loss=0.08666, over 17567.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01152, ecapa_loss=0.0002064, whisper_loss=0.0927, over 3819981.13 frames. ], batch size: 72, lr: 9.25e-03, grad_scale: 562949953421312.0 2024-08-11 03:27:19,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=891090.0, ans=0.1 2024-08-11 03:27:42,820 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 39 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 03:28:01,995 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2024-08-11 03:28:02,122 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2024-08-11 03:28:02,702 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-11 03:28:02,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=891390.0, ans=0.1 2024-08-11 03:28:10,567 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.563e-02 2024-08-11 03:28:20,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=891490.0, ans=0.2 2024-08-11 03:28:26,588 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2200, loss[loss=0.1032, beats_loss=0.01184, ecapa_loss=0.0001593, whisper_loss=0.08981, over 14787.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01145, ecapa_loss=0.0002068, whisper_loss=0.09351, over 3819219.02 frames. ], batch size: 56, lr: 9.25e-03, grad_scale: 562949953421312.0 2024-08-11 03:28:26,749 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 03:28:32,196 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 03:28:34,820 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 03:28:45,214 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 24 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 03:28:52,852 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 03:29:00,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=891790.0, ans=0.0 2024-08-11 03:29:08,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=891790.0, ans=0.05 2024-08-11 03:29:13,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=891890.0, ans=0.125 2024-08-11 03:29:18,522 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 03:29:27,786 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.671e+01 3.021e+01 3.496e+01 5.518e+01, threshold=6.042e+01, percent-clipped=0.0 2024-08-11 03:29:29,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=891990.0, ans=0.125 2024-08-11 03:29:35,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=891990.0, ans=0.125 2024-08-11 03:29:40,581 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2250, loss[loss=0.1175, beats_loss=0.01024, ecapa_loss=0.0001983, whisper_loss=0.1053, over 20900.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.0114, ecapa_loss=0.0002073, whisper_loss=0.09425, over 3861492.28 frames. ], batch size: 78, lr: 9.25e-03, grad_scale: 562949953421312.0 2024-08-11 03:29:53,047 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.44 vs. limit=22.5 2024-08-11 03:29:56,133 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 03:29:57,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=892190.0, ans=0.125 2024-08-11 03:30:03,362 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 29 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-11 03:30:08,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=892290.0, ans=0.125 2024-08-11 03:30:17,873 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-11 03:30:25,739 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 03:30:38,243 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-11 03:30:39,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=892490.0, ans=0.125 2024-08-11 03:30:43,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=892490.0, ans=0.0 2024-08-11 03:30:44,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=892490.0, ans=0.125 2024-08-11 03:30:52,498 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2300, loss[loss=0.1121, beats_loss=0.01377, ecapa_loss=0.0002026, whisper_loss=0.09627, over 20581.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01155, ecapa_loss=0.0002072, whisper_loss=0.09394, over 3899109.04 frames. ], batch size: 80, lr: 9.24e-03, grad_scale: 562949953421312.0 2024-08-11 03:30:59,449 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 14 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 03:31:14,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=892690.0, ans=0.0 2024-08-11 03:31:26,301 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-11 03:31:30,613 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 03:31:30,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=892790.0, ans=0.1 2024-08-11 03:31:42,914 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 33 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 03:31:48,426 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-11 03:31:54,829 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.862e+01 3.270e+01 3.564e+01 5.997e+01, threshold=6.539e+01, percent-clipped=0.0 2024-08-11 03:32:08,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=893090.0, ans=0.95 2024-08-11 03:32:09,170 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2350, loss[loss=0.08698, beats_loss=0.01111, ecapa_loss=0.0002518, whisper_loss=0.07335, over 19828.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01145, ecapa_loss=0.0002087, whisper_loss=0.09432, over 3899847.66 frames. ], batch size: 81, lr: 9.24e-03, grad_scale: 562949953421312.0 2024-08-11 03:32:18,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=893090.0, ans=0.125 2024-08-11 03:32:22,506 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 03:32:29,973 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 03:32:35,254 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.80 vs. limit=15.0 2024-08-11 03:32:45,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=893290.0, ans=0.125 2024-08-11 03:32:45,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=893290.0, ans=0.0 2024-08-11 03:33:02,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=893390.0, ans=0.1 2024-08-11 03:33:18,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=893490.0, ans=0.1 2024-08-11 03:33:25,644 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2400, loss[loss=0.09549, beats_loss=0.01109, ecapa_loss=0.000199, whisper_loss=0.0824, over 15153.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01143, ecapa_loss=0.0002089, whisper_loss=0.09381, over 3872957.97 frames. ], batch size: 58, lr: 9.24e-03, grad_scale: 562949953421312.0 2024-08-11 03:33:28,453 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 03:33:28,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=893590.0, ans=0.125 2024-08-11 03:33:33,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=893590.0, ans=0.2 2024-08-11 03:33:42,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=893690.0, ans=0.2 2024-08-11 03:33:49,627 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-11 03:34:28,746 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.589e+01 2.936e+01 3.311e+01 5.160e+01, threshold=5.871e+01, percent-clipped=0.0 2024-08-11 03:34:40,979 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-11 03:34:44,118 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2450, loss[loss=0.1251, beats_loss=0.009481, ecapa_loss=0.0001726, whisper_loss=0.1139, over 22920.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01148, ecapa_loss=0.0002084, whisper_loss=0.09329, over 3860160.33 frames. ], batch size: 86, lr: 9.24e-03, grad_scale: 562949953421312.0 2024-08-11 03:34:54,779 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 03:34:55,395 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.33 vs. limit=22.5 2024-08-11 03:35:00,571 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 03:35:22,054 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.31 vs. limit=10.0 2024-08-11 03:35:26,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=894290.0, ans=0.0 2024-08-11 03:35:28,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=894390.0, ans=0.125 2024-08-11 03:35:28,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=894390.0, ans=0.0 2024-08-11 03:35:30,075 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 22 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-11 03:35:30,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=894390.0, ans=0.125 2024-08-11 03:35:31,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=894390.0, ans=0.1 2024-08-11 03:35:39,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=894390.0, ans=0.125 2024-08-11 03:35:49,548 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 03:35:58,250 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2500, loss[loss=0.123, beats_loss=0.01089, ecapa_loss=0.0002168, whisper_loss=0.11, over 22830.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01143, ecapa_loss=0.0002093, whisper_loss=0.09317, over 3861007.90 frames. ], batch size: 92, lr: 9.23e-03, grad_scale: 562949953421312.0 2024-08-11 03:36:00,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=894590.0, ans=0.1 2024-08-11 03:36:29,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=894790.0, ans=0.0 2024-08-11 03:36:30,709 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 03:36:35,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=894790.0, ans=0.1 2024-08-11 03:36:53,476 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 14 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-11 03:37:03,450 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.710e+01 3.039e+01 3.423e+01 5.787e+01, threshold=6.079e+01, percent-clipped=0.0 2024-08-11 03:37:07,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=894990.0, ans=0.1 2024-08-11 03:37:12,414 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 14 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 03:37:16,847 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2550, loss[loss=0.1138, beats_loss=0.0102, ecapa_loss=0.0002094, whisper_loss=0.1015, over 23621.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01143, ecapa_loss=0.0002075, whisper_loss=0.0931, over 3896926.45 frames. ], batch size: 94, lr: 9.23e-03, grad_scale: 562949953421312.0 2024-08-11 03:37:25,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=895090.0, ans=0.1 2024-08-11 03:37:25,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=895090.0, ans=0.0 2024-08-11 03:37:26,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=895090.0, ans=0.1 2024-08-11 03:37:30,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=895090.0, ans=0.125 2024-08-11 03:37:51,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=895290.0, ans=0.125 2024-08-11 03:38:06,434 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 03:38:17,626 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-11 03:38:33,471 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2600, loss[loss=0.1357, beats_loss=0.009331, ecapa_loss=0.0002144, whisper_loss=0.1243, over 16729.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01145, ecapa_loss=0.0002072, whisper_loss=0.0934, over 3901754.82 frames. ], batch size: 64, lr: 9.23e-03, grad_scale: 562949953421312.0 2024-08-11 03:38:44,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=895590.0, ans=0.125 2024-08-11 03:38:50,861 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 03:38:57,110 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 03:38:58,804 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 16 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 03:39:09,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=895790.0, ans=0.125 2024-08-11 03:39:15,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=895790.0, ans=0.125 2024-08-11 03:39:17,385 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.463e-01 2024-08-11 03:39:36,697 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.572e+01 2.896e+01 3.197e+01 4.923e+01, threshold=5.792e+01, percent-clipped=0.0 2024-08-11 03:39:40,241 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 03:39:50,277 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2650, loss[loss=0.1104, beats_loss=0.01158, ecapa_loss=0.0002426, whisper_loss=0.0964, over 14653.00 frames. ], tot_loss[loss=0.107, beats_loss=0.0114, ecapa_loss=0.0002091, whisper_loss=0.09351, over 3882937.07 frames. ], batch size: 61, lr: 9.23e-03, grad_scale: 562949953421312.0 2024-08-11 03:40:02,879 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 32 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 03:40:13,055 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 26 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 03:40:21,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=896290.0, ans=0.1 2024-08-11 03:40:52,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=896490.0, ans=0.125 2024-08-11 03:40:53,079 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-08-11 03:40:57,140 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-11 03:41:01,951 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 24 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-11 03:41:05,855 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2700, loss[loss=0.09553, beats_loss=0.01368, ecapa_loss=0.0002097, whisper_loss=0.07976, over 22121.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01145, ecapa_loss=0.0002088, whisper_loss=0.09315, over 3898046.17 frames. ], batch size: 91, lr: 9.22e-03, grad_scale: 562949953421312.0 2024-08-11 03:41:09,404 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 28 from LS+wenet, 21 from Vox, 14 fro AS 2024-08-11 03:41:54,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=896890.0, ans=0.125 2024-08-11 03:41:58,175 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-11 03:42:05,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=896890.0, ans=0.0 2024-08-11 03:42:11,726 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.641e+01 2.955e+01 3.583e+01 6.037e+01, threshold=5.910e+01, percent-clipped=1.0 2024-08-11 03:42:25,483 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2750, loss[loss=0.1028, beats_loss=0.01172, ecapa_loss=0.0001795, whisper_loss=0.08931, over 21081.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01146, ecapa_loss=0.0002075, whisper_loss=0.09242, over 3876490.67 frames. ], batch size: 82, lr: 9.22e-03, grad_scale: 562949953421312.0 2024-08-11 03:42:53,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=897190.0, ans=0.125 2024-08-11 03:43:01,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=897290.0, ans=0.035 2024-08-11 03:43:15,530 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 15 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-11 03:43:17,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=897390.0, ans=0.125 2024-08-11 03:43:43,795 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2800, loss[loss=0.1148, beats_loss=0.01217, ecapa_loss=0.0001927, whisper_loss=0.1007, over 23198.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01148, ecapa_loss=0.0002059, whisper_loss=0.09292, over 3883734.23 frames. ], batch size: 93, lr: 9.22e-03, grad_scale: 562949953421312.0 2024-08-11 03:44:09,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=897690.0, ans=0.125 2024-08-11 03:44:48,658 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.147e+01 2.710e+01 2.962e+01 3.650e+01 5.339e+01, threshold=5.923e+01, percent-clipped=0.0 2024-08-11 03:45:02,303 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2850, loss[loss=0.1129, beats_loss=0.01209, ecapa_loss=0.0001536, whisper_loss=0.0993, over 23527.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01149, ecapa_loss=0.000206, whisper_loss=0.09334, over 3901761.08 frames. ], batch size: 90, lr: 9.21e-03, grad_scale: 562949953421312.0 2024-08-11 03:45:08,295 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 03:45:46,151 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=12.0 2024-08-11 03:45:54,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=15.0 2024-08-11 03:45:55,897 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-11 03:46:04,604 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 03:46:25,202 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2900, loss[loss=0.1145, beats_loss=0.01113, ecapa_loss=0.0001954, whisper_loss=0.1015, over 19413.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01149, ecapa_loss=0.0002078, whisper_loss=0.09359, over 3926081.81 frames. ], batch size: 76, lr: 9.21e-03, grad_scale: 562949953421312.0 2024-08-11 03:46:26,616 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-11 03:46:35,988 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 03:46:40,213 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.65 vs. limit=15.0 2024-08-11 03:47:04,155 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.10 vs. limit=22.5 2024-08-11 03:47:13,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=898890.0, ans=0.0 2024-08-11 03:47:30,825 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.297e+01 2.659e+01 2.989e+01 3.721e+01 7.203e+01, threshold=5.978e+01, percent-clipped=1.0 2024-08-11 03:47:31,074 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 03:47:36,853 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.878e-02 2024-08-11 03:47:38,352 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 35 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 03:47:42,826 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-11 03:47:45,983 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 2950, loss[loss=0.1004, beats_loss=0.01214, ecapa_loss=0.0001887, whisper_loss=0.08632, over 21673.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.0115, ecapa_loss=0.0002094, whisper_loss=0.09352, over 3937813.61 frames. ], batch size: 85, lr: 9.21e-03, grad_scale: 562949953421312.0 2024-08-11 03:48:14,727 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 03:48:28,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=899290.0, ans=0.125 2024-08-11 03:48:39,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=899390.0, ans=0.125 2024-08-11 03:48:44,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=899390.0, ans=0.09899494936611666 2024-08-11 03:49:01,174 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 03:49:01,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=899490.0, ans=0.5 2024-08-11 03:49:04,395 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 03:49:08,459 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3000, loss[loss=0.1392, beats_loss=0.009976, ecapa_loss=0.0002235, whisper_loss=0.1269, over 21952.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01146, ecapa_loss=0.0002089, whisper_loss=0.0948, over 3960054.37 frames. ], batch size: 88, lr: 9.21e-03, grad_scale: 562949953421312.0 2024-08-11 03:49:08,460 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 03:49:42,524 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([8.9100e-10, 2.6750e-02, 8.5654e-03, 1.0377e+00, 7.1570e-03, 4.4922e-02, 3.5041e-02, 2.4455e-02], device='cuda:2') 2024-08-11 03:49:48,965 INFO [train_multi_KD3.py:1149] (2/4) Epoch 7, validation on ASR_libri: loss=0.2586, beats_loss=0, ecapa_loss=0.0006718, whisper_loss=0.2519, over 922467.00 frames. 2024-08-11 03:50:03,336 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.9845, 1.7849, 1.7768, 1.2845], device='cuda:2') 2024-08-11 03:50:07,502 INFO [train_multi_KD3.py:1149] (2/4) Epoch 7, validation on SV_voxceleb1: loss=0.005617, beats_loss=0, ecapa_loss=0.0005617, whisper_loss=0, over 939242.00 frames. 2024-08-11 03:52:03,539 INFO [train_multi_KD3.py:1149] (2/4) Epoch 7, validation on AT_audioset: loss=0.02572, beats_loss=0.02572, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 03:52:03,543 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 03:52:06,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=899590.0, ans=0.95 2024-08-11 03:52:42,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=899790.0, ans=0.0 2024-08-11 03:52:45,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=899790.0, ans=0.0 2024-08-11 03:52:45,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=899790.0, ans=0.0 2024-08-11 03:52:53,384 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 03:53:01,895 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2024-08-11 03:53:03,346 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 03:53:03,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=899890.0, ans=0.125 2024-08-11 03:53:15,261 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.673e+01 3.038e+01 3.538e+01 6.757e+01, threshold=6.077e+01, percent-clipped=1.0 2024-08-11 03:53:18,470 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 03:53:24,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=899990.0, ans=0.1 2024-08-11 03:53:30,245 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3050, loss[loss=0.1304, beats_loss=0.01141, ecapa_loss=0.0001801, whisper_loss=0.1172, over 21335.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01146, ecapa_loss=0.0002101, whisper_loss=0.0945, over 3924026.23 frames. ], batch size: 81, lr: 9.20e-03, grad_scale: 1125899906842624.0 2024-08-11 03:53:30,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=900090.0, ans=0.0 2024-08-11 03:53:34,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=900090.0, ans=0.125 2024-08-11 03:53:39,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=900090.0, ans=0.125 2024-08-11 03:53:43,786 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.64 vs. limit=15.0 2024-08-11 03:53:45,607 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 03:53:50,692 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 03:53:57,517 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2024-08-11 03:54:07,254 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2024-08-11 03:54:50,688 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.56 vs. limit=15.0 2024-08-11 03:54:53,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=900490.0, ans=0.1 2024-08-11 03:54:57,813 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3100, loss[loss=0.103, beats_loss=0.01268, ecapa_loss=0.0002089, whisper_loss=0.08824, over 16371.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01149, ecapa_loss=0.0002103, whisper_loss=0.09401, over 3884847.97 frames. ], batch size: 66, lr: 9.20e-03, grad_scale: 1125899906842624.0 2024-08-11 03:55:02,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=900590.0, ans=0.0 2024-08-11 03:55:02,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=900590.0, ans=0.0 2024-08-11 03:55:10,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=900590.0, ans=0.125 2024-08-11 03:55:24,749 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2024-08-11 03:55:42,093 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 03:55:46,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=900890.0, ans=0.125 2024-08-11 03:55:51,902 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 03:55:52,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=900890.0, ans=0.0 2024-08-11 03:56:01,852 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-11 03:56:05,004 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.068e+01 2.642e+01 2.994e+01 3.477e+01 5.395e+01, threshold=5.988e+01, percent-clipped=0.0 2024-08-11 03:56:18,388 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 03:56:20,005 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3150, loss[loss=0.113, beats_loss=0.01171, ecapa_loss=0.0002119, whisper_loss=0.09916, over 19535.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01145, ecapa_loss=0.0002105, whisper_loss=0.09487, over 3861633.68 frames. ], batch size: 75, lr: 9.20e-03, grad_scale: 1125899906842624.0 2024-08-11 03:56:20,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=901090.0, ans=0.07 2024-08-11 03:56:21,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=901090.0, ans=0.125 2024-08-11 03:56:41,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=901190.0, ans=0.0 2024-08-11 03:56:46,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=901190.0, ans=0.0 2024-08-11 03:56:47,701 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-11 03:56:51,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=901290.0, ans=0.125 2024-08-11 03:56:52,588 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-11 03:56:57,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=901290.0, ans=0.0 2024-08-11 03:57:15,467 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-11 03:57:16,025 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2024-08-11 03:57:17,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=901390.0, ans=0.125 2024-08-11 03:57:44,136 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3200, loss[loss=0.1116, beats_loss=0.0109, ecapa_loss=0.0001988, whisper_loss=0.0987, over 16175.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01145, ecapa_loss=0.0002114, whisper_loss=0.09471, over 3821008.94 frames. ], batch size: 63, lr: 9.20e-03, grad_scale: 1125899906842624.0 2024-08-11 03:58:08,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=901690.0, ans=0.2 2024-08-11 03:58:25,144 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-11 03:58:50,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=901990.0, ans=0.1 2024-08-11 03:58:51,760 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.680e+01 2.966e+01 3.598e+01 6.746e+01, threshold=5.932e+01, percent-clipped=1.0 2024-08-11 03:58:54,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=901990.0, ans=0.125 2024-08-11 03:59:06,866 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3250, loss[loss=0.0885, beats_loss=0.01221, ecapa_loss=0.0001744, whisper_loss=0.07455, over 17451.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01143, ecapa_loss=0.0002114, whisper_loss=0.09499, over 3845252.91 frames. ], batch size: 66, lr: 9.19e-03, grad_scale: 1125899906842624.0 2024-08-11 03:59:22,967 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=15.0 2024-08-11 03:59:38,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=902290.0, ans=0.0 2024-08-11 03:59:39,616 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 03:59:39,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=902290.0, ans=0.1 2024-08-11 03:59:57,336 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.17 vs. limit=22.5 2024-08-11 03:59:58,162 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 03:59:58,668 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.16 vs. limit=22.5 2024-08-11 04:00:15,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=902490.0, ans=15.0 2024-08-11 04:00:19,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=902490.0, ans=0.125 2024-08-11 04:00:25,932 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3300, loss[loss=0.113, beats_loss=0.01221, ecapa_loss=0.0002189, whisper_loss=0.09863, over 22221.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01144, ecapa_loss=0.0002125, whisper_loss=0.09439, over 3849369.45 frames. ], batch size: 91, lr: 9.19e-03, grad_scale: 1125899906842624.0 2024-08-11 04:00:26,054 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 04:00:48,400 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.50 vs. limit=15.0 2024-08-11 04:01:03,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=902790.0, ans=0.0 2024-08-11 04:01:18,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=902890.0, ans=0.125 2024-08-11 04:01:25,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=902890.0, ans=0.1 2024-08-11 04:01:25,948 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.50 vs. limit=15.0 2024-08-11 04:01:28,040 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.15 vs. limit=22.5 2024-08-11 04:01:38,341 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.725e+01 3.245e+01 3.907e+01 7.359e+01, threshold=6.490e+01, percent-clipped=2.0 2024-08-11 04:01:47,941 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-11 04:01:52,063 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3350, loss[loss=0.1049, beats_loss=0.01241, ecapa_loss=0.0001815, whisper_loss=0.09069, over 15391.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01137, ecapa_loss=0.0002109, whisper_loss=0.09481, over 3851807.83 frames. ], batch size: 58, lr: 9.19e-03, grad_scale: 1125899906842624.0 2024-08-11 04:01:59,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=903090.0, ans=0.125 2024-08-11 04:02:17,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=903190.0, ans=0.2 2024-08-11 04:02:24,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=903290.0, ans=0.125 2024-08-11 04:02:45,724 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2024-08-11 04:02:55,250 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-11 04:03:13,332 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3400, loss[loss=0.09848, beats_loss=0.01011, ecapa_loss=0.0001923, whisper_loss=0.08645, over 17195.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01132, ecapa_loss=0.0002116, whisper_loss=0.09513, over 3878786.35 frames. ], batch size: 68, lr: 9.19e-03, grad_scale: 1125899906842624.0 2024-08-11 04:03:27,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=903690.0, ans=0.1 2024-08-11 04:03:32,307 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 04:03:48,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=903790.0, ans=0.0 2024-08-11 04:03:48,902 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.46 vs. limit=12.0 2024-08-11 04:03:52,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=903790.0, ans=0.0 2024-08-11 04:04:11,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2024-08-11 04:04:13,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=903890.0, ans=0.0 2024-08-11 04:04:18,392 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.715e+01 3.132e+01 3.599e+01 6.001e+01, threshold=6.265e+01, percent-clipped=0.0 2024-08-11 04:04:23,876 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.027e-02 2024-08-11 04:04:31,260 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.16 vs. limit=10.0 2024-08-11 04:04:32,128 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3450, loss[loss=0.1125, beats_loss=0.01173, ecapa_loss=0.0002578, whisper_loss=0.09818, over 21720.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01121, ecapa_loss=0.0002132, whisper_loss=0.09538, over 3859380.30 frames. ], batch size: 89, lr: 9.18e-03, grad_scale: 1125899906842624.0 2024-08-11 04:04:44,940 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=12.0 2024-08-11 04:04:50,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=904190.0, ans=0.07 2024-08-11 04:05:15,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=904390.0, ans=0.0 2024-08-11 04:05:24,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=904390.0, ans=0.0 2024-08-11 04:05:33,881 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-08-11 04:05:42,345 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3500, loss[loss=0.1024, beats_loss=0.01322, ecapa_loss=0.0002103, whisper_loss=0.08706, over 20842.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01126, ecapa_loss=0.0002145, whisper_loss=0.09434, over 3860913.88 frames. ], batch size: 84, lr: 9.18e-03, grad_scale: 1125899906842624.0 2024-08-11 04:05:45,182 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:05:55,561 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-11 04:05:59,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=904690.0, ans=0.0 2024-08-11 04:06:06,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=904690.0, ans=0.125 2024-08-11 04:06:09,228 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.57 vs. limit=22.5 2024-08-11 04:06:19,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=904790.0, ans=0.5 2024-08-11 04:06:35,612 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.243e+01 2.786e+01 3.047e+01 3.456e+01 6.070e+01, threshold=6.093e+01, percent-clipped=0.0 2024-08-11 04:06:37,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=904990.0, ans=0.1 2024-08-11 04:06:43,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=904990.0, ans=0.1 2024-08-11 04:06:47,059 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3550, loss[loss=0.09374, beats_loss=0.01211, ecapa_loss=0.0002415, whisper_loss=0.07922, over 17637.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01138, ecapa_loss=0.0002129, whisper_loss=0.09314, over 3873032.43 frames. ], batch size: 75, lr: 9.18e-03, grad_scale: 1125899906842624.0 2024-08-11 04:06:48,503 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 04:06:51,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=905090.0, ans=0.125 2024-08-11 04:07:16,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=905290.0, ans=0.0 2024-08-11 04:07:43,297 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.50 vs. limit=10.0 2024-08-11 04:07:53,159 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.56 vs. limit=15.0 2024-08-11 04:07:53,399 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3600, loss[loss=0.08559, beats_loss=0.01358, ecapa_loss=0.0002005, whisper_loss=0.07001, over 18910.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01137, ecapa_loss=0.0002128, whisper_loss=0.09357, over 3875390.50 frames. ], batch size: 78, lr: 9.18e-03, grad_scale: 1125899906842624.0 2024-08-11 04:07:59,970 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 04:08:05,398 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 04:08:09,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=905690.0, ans=0.125 2024-08-11 04:08:20,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=905790.0, ans=0.125 2024-08-11 04:08:21,271 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 04:08:22,634 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-11 04:08:25,228 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 04:08:30,231 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 04:08:39,292 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 04:08:47,157 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.715e+01 3.031e+01 3.500e+01 1.161e+02, threshold=6.062e+01, percent-clipped=1.0 2024-08-11 04:08:53,049 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 26 from LS+wenet, 7 from Vox, 30 fro AS 2024-08-11 04:08:59,311 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3650, loss[loss=0.1275, beats_loss=0.009923, ecapa_loss=0.0002241, whisper_loss=0.1154, over 22522.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01143, ecapa_loss=0.000212, whisper_loss=0.09365, over 3876196.22 frames. ], batch size: 91, lr: 9.17e-03, grad_scale: 1125899906842624.0 2024-08-11 04:09:00,882 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-11 04:09:03,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=906090.0, ans=0.1 2024-08-11 04:09:21,061 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.67 vs. limit=15.0 2024-08-11 04:09:31,721 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-11 04:09:37,114 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-11 04:09:44,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=906390.0, ans=0.125 2024-08-11 04:10:03,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=906590.0, ans=0.125 2024-08-11 04:10:04,202 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3700, loss[loss=0.1196, beats_loss=0.01112, ecapa_loss=0.0002329, whisper_loss=0.1061, over 22170.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01143, ecapa_loss=0.0002111, whisper_loss=0.09345, over 3848966.96 frames. ], batch size: 92, lr: 9.17e-03, grad_scale: 1125899906842624.0 2024-08-11 04:10:44,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=906890.0, ans=0.02 2024-08-11 04:10:57,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=906990.0, ans=0.125 2024-08-11 04:10:58,333 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.695e+01 3.038e+01 3.419e+01 5.061e+01, threshold=6.077e+01, percent-clipped=0.0 2024-08-11 04:11:10,825 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3750, loss[loss=0.08679, beats_loss=0.01385, ecapa_loss=0.0002005, whisper_loss=0.07094, over 19754.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01146, ecapa_loss=0.0002128, whisper_loss=0.09335, over 3845762.60 frames. ], batch size: 83, lr: 9.17e-03, grad_scale: 1125899906842624.0 2024-08-11 04:11:12,873 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2024-08-11 04:11:26,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=907190.0, ans=0.125 2024-08-11 04:11:39,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=907290.0, ans=0.0 2024-08-11 04:11:47,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=907290.0, ans=0.2 2024-08-11 04:11:51,010 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 13 from Vox, 54 fro AS 2024-08-11 04:11:57,754 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 14 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-11 04:11:58,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=907390.0, ans=0.0 2024-08-11 04:12:00,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=907390.0, ans=0.125 2024-08-11 04:12:01,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=907490.0, ans=0.125 2024-08-11 04:12:13,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=907490.0, ans=0.1 2024-08-11 04:12:16,157 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3800, loss[loss=0.1089, beats_loss=0.01089, ecapa_loss=0.0002492, whisper_loss=0.09551, over 16384.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01149, ecapa_loss=0.0002125, whisper_loss=0.09356, over 3843175.72 frames. ], batch size: 68, lr: 9.17e-03, grad_scale: 1125899906842624.0 2024-08-11 04:12:21,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=907590.0, ans=0.125 2024-08-11 04:12:34,663 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=15.34 vs. limit=15.0 2024-08-11 04:12:36,145 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.48 vs. limit=10.0 2024-08-11 04:12:37,689 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2024-08-11 04:12:44,637 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 04:13:01,630 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 04:13:09,636 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.721e+01 2.981e+01 3.416e+01 8.567e+01, threshold=5.961e+01, percent-clipped=1.0 2024-08-11 04:13:10,619 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.38 vs. limit=5.0 2024-08-11 04:13:22,230 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3850, loss[loss=0.1113, beats_loss=0.01207, ecapa_loss=0.0001919, whisper_loss=0.09733, over 20653.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01148, ecapa_loss=0.0002127, whisper_loss=0.09363, over 3832051.89 frames. ], batch size: 80, lr: 9.16e-03, grad_scale: 1125899906842624.0 2024-08-11 04:13:30,492 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-11 04:13:33,775 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.18 vs. limit=10.0 2024-08-11 04:13:50,943 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.61 vs. limit=15.0 2024-08-11 04:13:54,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=908290.0, ans=0.5 2024-08-11 04:13:59,994 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 04:14:15,630 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2024-08-11 04:14:26,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=908490.0, ans=0.125 2024-08-11 04:14:29,758 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-11 04:14:31,479 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-11 04:14:32,847 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3900, loss[loss=0.1018, beats_loss=0.01039, ecapa_loss=0.0002485, whisper_loss=0.08889, over 14764.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01148, ecapa_loss=0.0002117, whisper_loss=0.09393, over 3860460.39 frames. ], batch size: 63, lr: 9.16e-03, grad_scale: 1125899906842624.0 2024-08-11 04:14:34,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=908590.0, ans=0.1 2024-08-11 04:14:35,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=908590.0, ans=0.0 2024-08-11 04:14:44,109 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 04:14:47,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=908690.0, ans=0.0 2024-08-11 04:14:53,553 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 04:14:57,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=908690.0, ans=0.1 2024-08-11 04:15:21,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=908890.0, ans=0.125 2024-08-11 04:15:32,285 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2024-08-11 04:15:32,875 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.684e+01 3.033e+01 3.679e+01 6.201e+01, threshold=6.065e+01, percent-clipped=1.0 2024-08-11 04:15:42,982 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 04:15:45,766 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 3950, loss[loss=0.1128, beats_loss=0.01145, ecapa_loss=0.0002037, whisper_loss=0.09933, over 21889.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01148, ecapa_loss=0.0002124, whisper_loss=0.09389, over 3884976.65 frames. ], batch size: 88, lr: 9.16e-03, grad_scale: 1125899906842624.0 2024-08-11 04:15:56,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=909090.0, ans=0.125 2024-08-11 04:16:41,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=909390.0, ans=0.0 2024-08-11 04:16:47,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=909490.0, ans=0.125 2024-08-11 04:16:49,751 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 04:16:50,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=909490.0, ans=0.035 2024-08-11 04:16:52,813 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-11 04:16:57,932 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 04:16:59,393 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4000, loss[loss=0.1198, beats_loss=0.01034, ecapa_loss=0.0001762, whisper_loss=0.1077, over 23303.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01146, ecapa_loss=0.0002111, whisper_loss=0.09367, over 3898913.34 frames. ], batch size: 91, lr: 9.16e-03, grad_scale: 1125899906842624.0 2024-08-11 04:17:09,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=909590.0, ans=0.2 2024-08-11 04:17:11,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=909590.0, ans=0.125 2024-08-11 04:17:20,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=909690.0, ans=0.125 2024-08-11 04:17:26,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=909690.0, ans=0.125 2024-08-11 04:17:35,622 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.87 vs. limit=8.0 2024-08-11 04:17:37,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=909790.0, ans=0.125 2024-08-11 04:17:44,914 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2024-08-11 04:17:53,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=909890.0, ans=15.0 2024-08-11 04:18:00,770 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.789e+01 3.181e+01 3.971e+01 6.202e+01, threshold=6.363e+01, percent-clipped=1.0 2024-08-11 04:18:06,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=909990.0, ans=0.125 2024-08-11 04:18:10,937 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 04:18:15,028 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4050, loss[loss=0.1017, beats_loss=0.01166, ecapa_loss=0.000257, whisper_loss=0.08751, over 22679.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01149, ecapa_loss=0.0002104, whisper_loss=0.0937, over 3912716.40 frames. ], batch size: 94, lr: 9.15e-03, grad_scale: 1125899906842624.0 2024-08-11 04:18:15,152 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 04:18:31,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=910190.0, ans=0.0 2024-08-11 04:18:35,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=910190.0, ans=0.125 2024-08-11 04:18:38,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=910190.0, ans=0.025 2024-08-11 04:18:42,822 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-11 04:18:47,961 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-11 04:19:15,226 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 04:19:28,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=910490.0, ans=0.125 2024-08-11 04:19:30,238 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4100, loss[loss=0.08923, beats_loss=0.0131, ecapa_loss=0.0001746, whisper_loss=0.07439, over 18125.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01139, ecapa_loss=0.0002114, whisper_loss=0.09487, over 3907548.79 frames. ], batch size: 71, lr: 9.15e-03, grad_scale: 1125899906842624.0 2024-08-11 04:19:37,151 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 04:20:06,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=910790.0, ans=0.0 2024-08-11 04:20:18,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=910890.0, ans=0.1 2024-08-11 04:20:20,725 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 04:20:31,833 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.728e+01 2.968e+01 3.426e+01 6.142e+01, threshold=5.935e+01, percent-clipped=0.0 2024-08-11 04:20:41,920 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 04:20:43,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=910990.0, ans=0.125 2024-08-11 04:20:46,147 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4150, loss[loss=0.1145, beats_loss=0.01085, ecapa_loss=0.0002148, whisper_loss=0.1015, over 22543.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01142, ecapa_loss=0.0002107, whisper_loss=0.09475, over 3941631.71 frames. ], batch size: 92, lr: 9.15e-03, grad_scale: 1125899906842624.0 2024-08-11 04:20:48,189 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-11 04:20:56,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=911090.0, ans=0.035 2024-08-11 04:21:07,753 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.421e+02 2024-08-11 04:21:10,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=911190.0, ans=0.2 2024-08-11 04:21:22,741 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 04:21:24,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=911290.0, ans=0.125 2024-08-11 04:21:38,506 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 04:22:01,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=911590.0, ans=0.1 2024-08-11 04:22:02,630 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4200, loss[loss=0.1034, beats_loss=0.01074, ecapa_loss=0.0002202, whisper_loss=0.09047, over 18929.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01146, ecapa_loss=0.00021, whisper_loss=0.09408, over 3921356.33 frames. ], batch size: 76, lr: 9.15e-03, grad_scale: 1125899906842624.0 2024-08-11 04:22:27,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=911690.0, ans=0.125 2024-08-11 04:22:30,477 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 25 from LS+wenet, 9 from Vox, 23 fro AS 2024-08-11 04:22:51,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=911890.0, ans=0.2 2024-08-11 04:22:52,539 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 35 from Vox, 33 fro AS 2024-08-11 04:23:01,978 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+01 2.712e+01 3.098e+01 3.462e+01 7.406e+01, threshold=6.196e+01, percent-clipped=1.0 2024-08-11 04:23:06,039 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 04:23:13,193 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2024-08-11 04:23:13,824 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4250, loss[loss=0.1151, beats_loss=0.01181, ecapa_loss=0.0001823, whisper_loss=0.1014, over 23430.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01149, ecapa_loss=0.0002105, whisper_loss=0.09376, over 3931228.48 frames. ], batch size: 92, lr: 9.14e-03, grad_scale: 1125899906842624.0 2024-08-11 04:23:15,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=912090.0, ans=0.0 2024-08-11 04:23:18,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=912090.0, ans=0.0 2024-08-11 04:23:22,156 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.35 vs. limit=15.0 2024-08-11 04:23:25,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=912090.0, ans=0.025 2024-08-11 04:23:40,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=912290.0, ans=0.0 2024-08-11 04:23:45,570 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2024-08-11 04:23:53,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=912290.0, ans=0.1 2024-08-11 04:23:55,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=912390.0, ans=0.2 2024-08-11 04:23:59,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=912390.0, ans=0.125 2024-08-11 04:24:09,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=912490.0, ans=0.125 2024-08-11 04:24:10,517 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-11 04:24:12,024 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 04:24:22,554 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4300, loss[loss=0.0791, beats_loss=0.01574, ecapa_loss=0.0001691, whisper_loss=0.06168, over 18353.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01143, ecapa_loss=0.0002077, whisper_loss=0.09422, over 3909969.50 frames. ], batch size: 78, lr: 9.14e-03, grad_scale: 1125899906842624.0 2024-08-11 04:24:28,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=912590.0, ans=0.2 2024-08-11 04:24:38,111 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 31 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 04:24:39,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=912690.0, ans=0.1 2024-08-11 04:24:40,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=912690.0, ans=0.125 2024-08-11 04:24:53,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=912790.0, ans=0.0 2024-08-11 04:25:06,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=912890.0, ans=0.1 2024-08-11 04:25:16,153 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.652e+01 2.970e+01 3.355e+01 6.636e+01, threshold=5.939e+01, percent-clipped=1.0 2024-08-11 04:25:24,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=912990.0, ans=0.125 2024-08-11 04:25:26,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=912990.0, ans=0.125 2024-08-11 04:25:28,354 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4350, loss[loss=0.08845, beats_loss=0.01263, ecapa_loss=0.000206, whisper_loss=0.07376, over 20285.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01144, ecapa_loss=0.0002079, whisper_loss=0.0939, over 3899424.24 frames. ], batch size: 83, lr: 9.14e-03, grad_scale: 1125899906842624.0 2024-08-11 04:25:37,083 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.76 vs. limit=22.5 2024-08-11 04:25:39,181 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-11 04:25:48,314 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-11 04:25:54,702 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 40 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-11 04:25:56,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=913290.0, ans=0.0 2024-08-11 04:25:58,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=913290.0, ans=0.0 2024-08-11 04:26:09,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=913390.0, ans=0.1 2024-08-11 04:26:34,150 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4400, loss[loss=0.09427, beats_loss=0.01304, ecapa_loss=0.0001705, whisper_loss=0.07952, over 19448.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01151, ecapa_loss=0.0002087, whisper_loss=0.09334, over 3865884.32 frames. ], batch size: 78, lr: 9.14e-03, grad_scale: 1125899906842624.0 2024-08-11 04:26:43,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=913590.0, ans=0.0 2024-08-11 04:26:44,510 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 25 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-11 04:27:09,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=913790.0, ans=0.0 2024-08-11 04:27:27,845 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.550e+01 2.848e+01 3.646e+01 5.843e+01, threshold=5.697e+01, percent-clipped=0.0 2024-08-11 04:27:33,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=913990.0, ans=0.125 2024-08-11 04:27:34,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=913990.0, ans=0.125 2024-08-11 04:27:36,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=913990.0, ans=0.2 2024-08-11 04:27:38,919 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2024-08-11 04:27:39,578 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4450, loss[loss=0.1102, beats_loss=0.01308, ecapa_loss=0.0001584, whisper_loss=0.09551, over 23081.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01137, ecapa_loss=0.0002091, whisper_loss=0.09472, over 3887186.46 frames. ], batch size: 92, lr: 9.13e-03, grad_scale: 1125899906842624.0 2024-08-11 04:27:41,044 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 04:27:48,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=914090.0, ans=0.5 2024-08-11 04:27:56,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=914190.0, ans=0.125 2024-08-11 04:27:59,886 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 04:28:11,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=914290.0, ans=0.07 2024-08-11 04:28:26,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=914390.0, ans=0.1 2024-08-11 04:28:28,639 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 04:28:51,465 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4500, loss[loss=0.1174, beats_loss=0.009535, ecapa_loss=0.0002088, whisper_loss=0.1057, over 19559.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01141, ecapa_loss=0.0002094, whisper_loss=0.09439, over 3866085.15 frames. ], batch size: 76, lr: 9.13e-03, grad_scale: 1125899906842624.0 2024-08-11 04:28:59,011 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-11 04:29:04,552 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 14 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-11 04:29:05,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=12.0 2024-08-11 04:29:10,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=914690.0, ans=0.0 2024-08-11 04:29:16,775 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.28 vs. limit=22.5 2024-08-11 04:29:20,105 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 04:29:24,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=914790.0, ans=0.125 2024-08-11 04:29:26,782 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 04:29:47,946 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.125e+01 2.660e+01 3.113e+01 3.675e+01 6.136e+01, threshold=6.226e+01, percent-clipped=1.0 2024-08-11 04:29:48,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=914990.0, ans=0.2 2024-08-11 04:29:56,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=914990.0, ans=0.02 2024-08-11 04:29:57,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=914990.0, ans=0.0 2024-08-11 04:29:59,995 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4550, loss[loss=0.1001, beats_loss=0.01228, ecapa_loss=0.0001776, whisper_loss=0.08604, over 19374.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01143, ecapa_loss=0.0002111, whisper_loss=0.09356, over 3880544.79 frames. ], batch size: 77, lr: 9.13e-03, grad_scale: 1125899906842624.0 2024-08-11 04:30:06,690 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 15 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-11 04:30:09,230 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-11 04:30:13,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=915190.0, ans=0.05 2024-08-11 04:30:50,277 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 04:31:04,836 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 04:31:05,914 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4600, loss[loss=0.1172, beats_loss=0.01087, ecapa_loss=0.0002132, whisper_loss=0.1042, over 22439.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01146, ecapa_loss=0.0002101, whisper_loss=0.09341, over 3878138.83 frames. ], batch size: 90, lr: 9.13e-03, grad_scale: 1125899906842624.0 2024-08-11 04:31:06,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=915590.0, ans=0.0 2024-08-11 04:31:25,136 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.52 vs. limit=15.0 2024-08-11 04:31:46,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=915890.0, ans=0.1 2024-08-11 04:31:51,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=915890.0, ans=0.125 2024-08-11 04:31:51,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=915890.0, ans=0.0 2024-08-11 04:31:59,071 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.820e+01 3.108e+01 3.626e+01 5.972e+01, threshold=6.216e+01, percent-clipped=0.0 2024-08-11 04:31:59,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=915990.0, ans=0.125 2024-08-11 04:32:02,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=915990.0, ans=0.0 2024-08-11 04:32:06,570 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2024-08-11 04:32:08,393 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 04:32:11,020 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4650, loss[loss=0.09736, beats_loss=0.01049, ecapa_loss=0.0002353, whisper_loss=0.08452, over 22184.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.0115, ecapa_loss=0.0002102, whisper_loss=0.09306, over 3888221.14 frames. ], batch size: 93, lr: 9.12e-03, grad_scale: 1125899906842624.0 2024-08-11 04:32:14,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=916090.0, ans=0.1 2024-08-11 04:32:26,389 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:32:30,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=916190.0, ans=0.125 2024-08-11 04:32:41,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=916290.0, ans=0.125 2024-08-11 04:32:42,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=916290.0, ans=22.5 2024-08-11 04:32:49,034 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:33:04,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=916490.0, ans=0.2 2024-08-11 04:33:17,530 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4700, loss[loss=0.112, beats_loss=0.01156, ecapa_loss=0.0002358, whisper_loss=0.09809, over 19785.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01144, ecapa_loss=0.0002092, whisper_loss=0.09374, over 3855939.45 frames. ], batch size: 81, lr: 9.12e-03, grad_scale: 1125899906842624.0 2024-08-11 04:33:38,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=916690.0, ans=0.125 2024-08-11 04:33:42,227 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=15.0 2024-08-11 04:33:53,831 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 04:34:05,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=916890.0, ans=0.2 2024-08-11 04:34:07,706 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:34:12,125 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.755e+01 3.145e+01 3.501e+01 4.476e+01, threshold=6.290e+01, percent-clipped=0.0 2024-08-11 04:34:16,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=916990.0, ans=0.0 2024-08-11 04:34:23,829 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4750, loss[loss=0.1137, beats_loss=0.01092, ecapa_loss=0.0001698, whisper_loss=0.1011, over 17267.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0115, ecapa_loss=0.0002085, whisper_loss=0.09384, over 3855004.07 frames. ], batch size: 65, lr: 9.12e-03, grad_scale: 1125899906842624.0 2024-08-11 04:34:25,375 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 16 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 04:34:34,185 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 04:34:50,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=917290.0, ans=0.0 2024-08-11 04:35:21,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=917490.0, ans=0.1 2024-08-11 04:35:24,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=917490.0, ans=0.125 2024-08-11 04:35:28,821 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4800, loss[loss=0.1013, beats_loss=0.01098, ecapa_loss=0.0002522, whisper_loss=0.08778, over 17750.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01147, ecapa_loss=0.0002107, whisper_loss=0.09444, over 3876777.45 frames. ], batch size: 73, lr: 9.12e-03, grad_scale: 1125899906842624.0 2024-08-11 04:35:46,559 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.38 vs. limit=15.0 2024-08-11 04:35:49,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=917690.0, ans=0.125 2024-08-11 04:36:08,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=917890.0, ans=0.125 2024-08-11 04:36:16,462 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2024-08-11 04:36:19,851 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-11 04:36:21,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=917990.0, ans=0.125 2024-08-11 04:36:22,197 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.245e+01 2.878e+01 3.268e+01 3.983e+01 7.610e+01, threshold=6.536e+01, percent-clipped=1.0 2024-08-11 04:36:29,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=917990.0, ans=0.125 2024-08-11 04:36:34,023 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4850, loss[loss=0.1197, beats_loss=0.008327, ecapa_loss=0.0001917, whisper_loss=0.1094, over 15536.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01139, ecapa_loss=0.0002101, whisper_loss=0.09528, over 3884907.56 frames. ], batch size: 57, lr: 9.11e-03, grad_scale: 1125899906842624.0 2024-08-11 04:36:38,241 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 04:36:41,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=918090.0, ans=0.0 2024-08-11 04:36:55,373 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 04:36:59,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=918290.0, ans=0.125 2024-08-11 04:37:12,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=918390.0, ans=0.125 2024-08-11 04:37:38,776 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4900, loss[loss=0.0979, beats_loss=0.01228, ecapa_loss=0.0002054, whisper_loss=0.08357, over 18755.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01141, ecapa_loss=0.0002095, whisper_loss=0.09474, over 3866018.18 frames. ], batch size: 76, lr: 9.11e-03, grad_scale: 1125899906842624.0 2024-08-11 04:37:39,023 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 04:37:51,658 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 04:37:58,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=918690.0, ans=0.0 2024-08-11 04:38:04,855 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-11 04:38:05,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=918790.0, ans=0.09899494936611666 2024-08-11 04:38:07,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=918790.0, ans=0.2 2024-08-11 04:38:19,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=918890.0, ans=0.125 2024-08-11 04:38:21,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=918890.0, ans=0.2 2024-08-11 04:38:24,399 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 32 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-11 04:38:31,845 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.611e+01 2.961e+01 3.443e+01 6.053e+01, threshold=5.922e+01, percent-clipped=0.0 2024-08-11 04:38:42,680 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 04:38:43,814 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 4950, loss[loss=0.103, beats_loss=0.01138, ecapa_loss=0.0002037, whisper_loss=0.08955, over 15464.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01146, ecapa_loss=0.0002098, whisper_loss=0.09413, over 3862466.03 frames. ], batch size: 59, lr: 9.11e-03, grad_scale: 1125899906842624.0 2024-08-11 04:38:46,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=919090.0, ans=0.125 2024-08-11 04:38:50,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=919090.0, ans=0.125 2024-08-11 04:38:53,356 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:39:01,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=919190.0, ans=0.2 2024-08-11 04:39:08,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=919190.0, ans=0.1 2024-08-11 04:39:17,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=919290.0, ans=0.125 2024-08-11 04:39:29,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=919390.0, ans=0.0 2024-08-11 04:39:43,660 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-11 04:39:53,175 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5000, loss[loss=0.1041, beats_loss=0.01073, ecapa_loss=0.0001963, whisper_loss=0.09139, over 21305.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01138, ecapa_loss=0.0002125, whisper_loss=0.09467, over 3874548.11 frames. ], batch size: 86, lr: 9.11e-03, grad_scale: 1125899906842624.0 2024-08-11 04:39:58,802 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 19 from Vox, 52 fro AS 2024-08-11 04:40:09,747 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-11 04:40:12,575 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 04:40:29,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=919790.0, ans=0.035 2024-08-11 04:40:34,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=919890.0, ans=0.125 2024-08-11 04:40:37,251 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 04:40:37,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=919890.0, ans=0.125 2024-08-11 04:40:54,903 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.724e+01 2.983e+01 3.443e+01 5.585e+01, threshold=5.966e+01, percent-clipped=0.0 2024-08-11 04:41:06,929 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5050, loss[loss=0.1106, beats_loss=0.01104, ecapa_loss=0.0001905, whisper_loss=0.09766, over 22129.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01148, ecapa_loss=0.0002108, whisper_loss=0.09489, over 3921984.10 frames. ], batch size: 87, lr: 9.10e-03, grad_scale: 2251799813685248.0 2024-08-11 04:41:11,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=920090.0, ans=0.125 2024-08-11 04:41:22,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=920190.0, ans=0.0 2024-08-11 04:41:34,440 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.72 vs. limit=22.5 2024-08-11 04:41:51,466 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 12 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 04:41:58,095 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 04:42:12,177 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 04:42:14,707 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-11 04:42:18,643 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5100, loss[loss=0.1173, beats_loss=0.01189, ecapa_loss=0.0002322, whisper_loss=0.1031, over 14868.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.0115, ecapa_loss=0.0002093, whisper_loss=0.09519, over 3939799.17 frames. ], batch size: 61, lr: 9.10e-03, grad_scale: 2251799813685248.0 2024-08-11 04:42:22,830 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 33 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 04:42:28,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=920590.0, ans=0.1 2024-08-11 04:42:33,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=920690.0, ans=0.125 2024-08-11 04:42:41,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=920690.0, ans=0.125 2024-08-11 04:42:43,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=920690.0, ans=0.1 2024-08-11 04:43:17,419 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.667e+01 3.175e+01 3.575e+01 5.874e+01, threshold=6.350e+01, percent-clipped=0.0 2024-08-11 04:43:17,680 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 04:43:27,099 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 04:43:31,565 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5150, loss[loss=0.1187, beats_loss=0.01054, ecapa_loss=0.0002106, whisper_loss=0.1061, over 21775.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01151, ecapa_loss=0.000208, whisper_loss=0.09512, over 3925336.68 frames. ], batch size: 87, lr: 9.10e-03, grad_scale: 2251799813685248.0 2024-08-11 04:43:39,924 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 04:43:56,476 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 04:44:13,796 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.34 vs. limit=10.0 2024-08-11 04:44:19,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=921390.0, ans=0.125 2024-08-11 04:44:22,083 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.66 vs. limit=15.0 2024-08-11 04:44:45,518 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5200, loss[loss=0.08096, beats_loss=0.01246, ecapa_loss=0.0001791, whisper_loss=0.06671, over 18526.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.0115, ecapa_loss=0.000209, whisper_loss=0.09415, over 3911715.14 frames. ], batch size: 73, lr: 9.10e-03, grad_scale: 2251799813685248.0 2024-08-11 04:44:51,561 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-11 04:44:56,185 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 19 from LS+wenet, 27 from Vox, 47 fro AS 2024-08-11 04:45:07,469 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 14 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 04:45:12,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=921690.0, ans=0.2 2024-08-11 04:45:14,657 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2024-08-11 04:45:21,559 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 04:45:23,579 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 04:45:25,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=921790.0, ans=0.1 2024-08-11 04:45:37,048 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 24 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 04:45:40,914 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.16 vs. limit=10.0 2024-08-11 04:45:47,374 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.623e+01 2.992e+01 3.438e+01 5.362e+01, threshold=5.985e+01, percent-clipped=0.0 2024-08-11 04:45:57,474 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.02 vs. limit=15.0 2024-08-11 04:45:58,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=921990.0, ans=0.125 2024-08-11 04:46:00,822 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5250, loss[loss=0.0945, beats_loss=0.01184, ecapa_loss=0.0001454, whisper_loss=0.0812, over 15553.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01151, ecapa_loss=0.0002087, whisper_loss=0.09358, over 3870663.21 frames. ], batch size: 57, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:46:07,373 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 04:46:07,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=922090.0, ans=0.125 2024-08-11 04:46:12,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=922090.0, ans=15.0 2024-08-11 04:46:13,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=922090.0, ans=0.0 2024-08-11 04:46:26,852 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 30 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 04:46:41,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=922290.0, ans=0.125 2024-08-11 04:46:44,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=922390.0, ans=0.125 2024-08-11 04:46:52,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=922390.0, ans=0.07 2024-08-11 04:47:02,403 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2024-08-11 04:47:10,525 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 04:47:13,023 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5300, loss[loss=0.1055, beats_loss=0.009949, ecapa_loss=0.0002647, whisper_loss=0.09289, over 21472.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01148, ecapa_loss=0.0002082, whisper_loss=0.09319, over 3868130.38 frames. ], batch size: 89, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:47:18,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=922590.0, ans=0.1 2024-08-11 04:47:31,459 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:47:36,491 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 31 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 04:47:57,516 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:48:10,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=922990.0, ans=0.2 2024-08-11 04:48:11,221 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.730e+01 3.116e+01 3.540e+01 5.766e+01, threshold=6.232e+01, percent-clipped=0.0 2024-08-11 04:48:13,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=922990.0, ans=0.0 2024-08-11 04:48:14,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=922990.0, ans=0.04949747468305833 2024-08-11 04:48:14,847 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2024-08-11 04:48:24,734 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5350, loss[loss=0.1064, beats_loss=0.01204, ecapa_loss=0.000207, whisper_loss=0.09224, over 21023.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01158, ecapa_loss=0.0002071, whisper_loss=0.09258, over 3884061.85 frames. ], batch size: 82, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:48:39,757 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2024-08-11 04:48:42,202 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:48:55,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=923290.0, ans=0.125 2024-08-11 04:49:18,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=923390.0, ans=0.125 2024-08-11 04:49:23,417 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 04:49:36,469 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5400, loss[loss=0.1316, beats_loss=0.006998, ecapa_loss=0.0002082, whisper_loss=0.1225, over 14939.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01148, ecapa_loss=0.0002065, whisper_loss=0.0939, over 3906691.20 frames. ], batch size: 55, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:49:43,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=923590.0, ans=0.0 2024-08-11 04:49:53,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=923690.0, ans=0.125 2024-08-11 04:49:53,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=923690.0, ans=0.125 2024-08-11 04:49:56,600 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 04:50:01,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=923690.0, ans=0.0 2024-08-11 04:50:12,573 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-11 04:50:16,139 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2024-08-11 04:50:25,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=923890.0, ans=0.125 2024-08-11 04:50:29,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=923990.0, ans=0.125 2024-08-11 04:50:30,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=923990.0, ans=0.0 2024-08-11 04:50:31,603 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.648e+01 2.918e+01 3.540e+01 6.193e+01, threshold=5.836e+01, percent-clipped=0.0 2024-08-11 04:50:43,314 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5450, loss[loss=0.109, beats_loss=0.01198, ecapa_loss=0.0002669, whisper_loss=0.09433, over 22076.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01146, ecapa_loss=0.0002058, whisper_loss=0.09367, over 3876647.55 frames. ], batch size: 94, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:50:47,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=924090.0, ans=0.1 2024-08-11 04:50:57,492 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.04 vs. limit=12.0 2024-08-11 04:50:58,374 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 04:50:58,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=924190.0, ans=0.0 2024-08-11 04:51:07,624 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 04:51:09,173 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 04:51:09,651 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-11 04:51:11,051 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.03 vs. limit=22.5 2024-08-11 04:51:23,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=924390.0, ans=0.125 2024-08-11 04:51:25,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=924390.0, ans=0.125 2024-08-11 04:51:30,185 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 26 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 04:51:30,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=924390.0, ans=0.1 2024-08-11 04:51:36,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=924490.0, ans=0.125 2024-08-11 04:51:49,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=924590.0, ans=0.125 2024-08-11 04:51:50,713 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5500, loss[loss=0.1074, beats_loss=0.01252, ecapa_loss=0.0001209, whisper_loss=0.09367, over 17189.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.0114, ecapa_loss=0.0002051, whisper_loss=0.09416, over 3875019.78 frames. ], batch size: 62, lr: 9.08e-03, grad_scale: 2251799813685248.0 2024-08-11 04:51:53,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=924590.0, ans=0.2 2024-08-11 04:51:53,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=924590.0, ans=0.0 2024-08-11 04:52:12,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=924690.0, ans=0.5 2024-08-11 04:52:19,791 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-11 04:52:25,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=924790.0, ans=0.125 2024-08-11 04:52:32,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=924890.0, ans=0.0 2024-08-11 04:52:40,424 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 04:52:43,170 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 04:52:44,644 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.644e+01 3.103e+01 3.543e+01 6.260e+01, threshold=6.206e+01, percent-clipped=1.0 2024-08-11 04:52:47,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=924990.0, ans=0.125 2024-08-11 04:52:53,690 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-11 04:52:56,050 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5550, loss[loss=0.1141, beats_loss=0.00928, ecapa_loss=0.0002336, whisper_loss=0.1025, over 22336.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01146, ecapa_loss=0.000207, whisper_loss=0.09418, over 3912222.76 frames. ], batch size: 91, lr: 9.08e-03, grad_scale: 2251799813685248.0 2024-08-11 04:52:58,027 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2024-08-11 04:53:01,878 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=9.448e-02 2024-08-11 04:53:03,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=925090.0, ans=0.0 2024-08-11 04:53:49,301 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.61 vs. limit=15.0 2024-08-11 04:54:01,545 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5600, loss[loss=0.1147, beats_loss=0.01236, ecapa_loss=0.00018, whisper_loss=0.1005, over 23637.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01148, ecapa_loss=0.0002067, whisper_loss=0.09408, over 3930387.08 frames. ], batch size: 93, lr: 9.08e-03, grad_scale: 2251799813685248.0 2024-08-11 04:54:11,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=925590.0, ans=0.125 2024-08-11 04:54:16,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=925690.0, ans=0.0 2024-08-11 04:54:20,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=925690.0, ans=0.09899494936611666 2024-08-11 04:54:20,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=925690.0, ans=0.125 2024-08-11 04:54:27,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=925790.0, ans=0.125 2024-08-11 04:54:35,918 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 04:54:36,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=925790.0, ans=0.0 2024-08-11 04:54:44,370 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.00 vs. limit=22.5 2024-08-11 04:54:45,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=925890.0, ans=0.0 2024-08-11 04:54:53,558 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 04:54:54,723 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+01 2.711e+01 3.123e+01 3.568e+01 9.227e+01, threshold=6.245e+01, percent-clipped=1.0 2024-08-11 04:54:56,017 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 04:55:05,870 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5650, loss[loss=0.09998, beats_loss=0.01341, ecapa_loss=0.0002241, whisper_loss=0.08433, over 21847.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01147, ecapa_loss=0.0002081, whisper_loss=0.09356, over 3925360.51 frames. ], batch size: 92, lr: 9.08e-03, grad_scale: 2251799813685248.0 2024-08-11 04:55:22,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=926190.0, ans=0.125 2024-08-11 04:55:25,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=926190.0, ans=0.0 2024-08-11 04:55:27,970 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 19 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-11 04:55:31,844 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-11 04:55:40,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=926290.0, ans=0.125 2024-08-11 04:55:44,578 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.70 vs. limit=15.0 2024-08-11 04:55:50,698 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.08 vs. limit=15.0 2024-08-11 04:55:50,844 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.96 vs. limit=15.0 2024-08-11 04:55:57,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=926490.0, ans=0.0 2024-08-11 04:56:07,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=926490.0, ans=0.0 2024-08-11 04:56:10,942 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5700, loss[loss=0.1199, beats_loss=0.01156, ecapa_loss=0.0001986, whisper_loss=0.1063, over 23132.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01154, ecapa_loss=0.0002093, whisper_loss=0.09377, over 3926512.07 frames. ], batch size: 90, lr: 9.07e-03, grad_scale: 2251799813685248.0 2024-08-11 04:56:13,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=926590.0, ans=0.125 2024-08-11 04:56:15,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=926590.0, ans=0.125 2024-08-11 04:56:18,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=926590.0, ans=10.0 2024-08-11 04:56:20,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=926590.0, ans=0.0 2024-08-11 04:56:23,763 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-11 04:56:38,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=926790.0, ans=0.1 2024-08-11 04:56:39,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=926790.0, ans=0.2 2024-08-11 04:56:40,229 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2024-08-11 04:56:55,340 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 04:56:57,784 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-11 04:57:03,971 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.796e+01 3.057e+01 3.549e+01 5.833e+01, threshold=6.113e+01, percent-clipped=0.0 2024-08-11 04:57:04,185 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 04:57:14,591 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 22 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 04:57:15,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5750, loss[loss=0.09064, beats_loss=0.01241, ecapa_loss=0.0002713, whisper_loss=0.07552, over 19528.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01155, ecapa_loss=0.0002094, whisper_loss=0.09373, over 3943599.85 frames. ], batch size: 85, lr: 9.07e-03, grad_scale: 2251799813685248.0 2024-08-11 04:57:18,485 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-11 04:57:20,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=927090.0, ans=0.125 2024-08-11 04:57:28,829 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 31 from Vox, 24 fro AS 2024-08-11 04:57:32,089 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=7.88 vs. limit=12.0 2024-08-11 04:57:39,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=927190.0, ans=0.2 2024-08-11 04:57:45,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=927290.0, ans=0.125 2024-08-11 04:57:49,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=927290.0, ans=0.0 2024-08-11 04:57:54,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=927390.0, ans=0.125 2024-08-11 04:58:21,381 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5800, loss[loss=0.111, beats_loss=0.01059, ecapa_loss=0.0002625, whisper_loss=0.09778, over 15320.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01151, ecapa_loss=0.0002094, whisper_loss=0.0936, over 3916159.63 frames. ], batch size: 61, lr: 9.07e-03, grad_scale: 2251799813685248.0 2024-08-11 04:58:21,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=927590.0, ans=0.125 2024-08-11 04:58:39,814 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 04:58:42,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=927690.0, ans=0.0 2024-08-11 04:58:46,131 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 19 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 04:58:47,394 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 19 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 04:59:14,146 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.689e+01 2.933e+01 3.272e+01 5.873e+01, threshold=5.865e+01, percent-clipped=0.0 2024-08-11 04:59:20,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=927990.0, ans=0.2 2024-08-11 04:59:22,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=927990.0, ans=0.05 2024-08-11 04:59:25,963 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5850, loss[loss=0.1053, beats_loss=0.01013, ecapa_loss=0.00023, whisper_loss=0.09291, over 14467.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01144, ecapa_loss=0.0002103, whisper_loss=0.09355, over 3915637.85 frames. ], batch size: 59, lr: 9.07e-03, grad_scale: 2251799813685248.0 2024-08-11 04:59:26,149 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 04:59:28,678 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-11 04:59:36,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=928090.0, ans=0.125 2024-08-11 04:59:48,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=928190.0, ans=0.0 2024-08-11 04:59:54,913 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.94 vs. limit=22.5 2024-08-11 04:59:57,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=928290.0, ans=0.0 2024-08-11 05:00:04,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=928390.0, ans=0.1 2024-08-11 05:00:25,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=928490.0, ans=0.5 2024-08-11 05:00:26,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-11 05:00:30,673 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5900, loss[loss=0.1057, beats_loss=0.01127, ecapa_loss=0.0001829, whisper_loss=0.09258, over 14108.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01156, ecapa_loss=0.0002097, whisper_loss=0.09323, over 3905279.10 frames. ], batch size: 54, lr: 9.06e-03, grad_scale: 2251799813685248.0 2024-08-11 05:00:41,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=928590.0, ans=0.0 2024-08-11 05:00:49,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=928690.0, ans=0.015 2024-08-11 05:00:56,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=928790.0, ans=0.125 2024-08-11 05:01:21,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=928890.0, ans=0.0 2024-08-11 05:01:24,868 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.592e+01 2.867e+01 3.350e+01 5.876e+01, threshold=5.735e+01, percent-clipped=1.0 2024-08-11 05:01:30,903 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 05:01:36,227 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 5950, loss[loss=0.1187, beats_loss=0.01105, ecapa_loss=0.0001779, whisper_loss=0.1058, over 21917.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01159, ecapa_loss=0.00021, whisper_loss=0.09296, over 3891690.95 frames. ], batch size: 83, lr: 9.06e-03, grad_scale: 2251799813685248.0 2024-08-11 05:01:40,251 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 05:01:44,689 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.11 vs. limit=22.5 2024-08-11 05:01:49,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=929190.0, ans=0.125 2024-08-11 05:02:02,322 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 05:02:05,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=929290.0, ans=0.0 2024-08-11 05:02:07,648 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-11 05:02:08,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=929290.0, ans=0.125 2024-08-11 05:02:20,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=929390.0, ans=0.0 2024-08-11 05:02:26,804 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.29 vs. limit=22.5 2024-08-11 05:02:29,397 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.45 vs. limit=15.0 2024-08-11 05:02:39,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=929490.0, ans=0.2 2024-08-11 05:02:41,660 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6000, loss[loss=0.1001, beats_loss=0.01242, ecapa_loss=0.0002291, whisper_loss=0.08539, over 21264.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01154, ecapa_loss=0.000209, whisper_loss=0.09346, over 3894389.36 frames. ], batch size: 88, lr: 9.06e-03, grad_scale: 2251799813685248.0 2024-08-11 05:02:41,661 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 05:03:21,158 INFO [train_multi_KD3.py:1149] (2/4) Epoch 7, validation on ASR_libri: loss=0.2594, beats_loss=0, ecapa_loss=0.0006753, whisper_loss=0.2527, over 922467.00 frames. 2024-08-11 05:03:38,391 INFO [train_multi_KD3.py:1149] (2/4) Epoch 7, validation on SV_voxceleb1: loss=0.005594, beats_loss=0, ecapa_loss=0.0005594, whisper_loss=0, over 939242.00 frames. 2024-08-11 05:05:33,520 INFO [train_multi_KD3.py:1149] (2/4) Epoch 7, validation on AT_audioset: loss=0.0256, beats_loss=0.0256, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 05:05:33,523 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 05:05:44,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=929590.0, ans=0.125 2024-08-11 05:05:47,499 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 05:06:06,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=929790.0, ans=0.07 2024-08-11 05:06:22,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=929890.0, ans=0.125 2024-08-11 05:06:27,185 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.542e+01 2.913e+01 3.356e+01 5.863e+01, threshold=5.826e+01, percent-clipped=1.0 2024-08-11 05:06:36,431 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 05:06:38,737 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6050, loss[loss=0.1185, beats_loss=0.012, ecapa_loss=0.0001858, whisper_loss=0.1046, over 22992.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01151, ecapa_loss=0.0002104, whisper_loss=0.09415, over 3909138.75 frames. ], batch size: 92, lr: 9.06e-03, grad_scale: 2251799813685248.0 2024-08-11 05:06:39,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=930090.0, ans=0.2 2024-08-11 05:06:41,913 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 05:06:47,412 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.00 vs. limit=22.5 2024-08-11 05:06:50,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=930190.0, ans=0.0 2024-08-11 05:06:54,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=930190.0, ans=0.125 2024-08-11 05:06:59,076 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2024-08-11 05:07:21,638 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.39 vs. limit=10.0 2024-08-11 05:07:43,949 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6100, loss[loss=0.1168, beats_loss=0.008739, ecapa_loss=0.0002833, whisper_loss=0.1052, over 14279.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01151, ecapa_loss=0.0002113, whisper_loss=0.09388, over 3890679.87 frames. ], batch size: 60, lr: 9.05e-03, grad_scale: 2251799813685248.0 2024-08-11 05:07:45,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=930590.0, ans=0.2 2024-08-11 05:07:46,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=930590.0, ans=0.0 2024-08-11 05:07:52,586 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 05:08:02,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=930690.0, ans=0.125 2024-08-11 05:08:15,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=930790.0, ans=10.0 2024-08-11 05:08:15,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=930790.0, ans=0.0 2024-08-11 05:08:16,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=930790.0, ans=0.125 2024-08-11 05:08:22,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=930890.0, ans=0.1 2024-08-11 05:08:37,027 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.611e+01 2.902e+01 3.349e+01 2.714e+02, threshold=5.803e+01, percent-clipped=1.0 2024-08-11 05:08:38,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=930990.0, ans=10.0 2024-08-11 05:08:41,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=930990.0, ans=0.0 2024-08-11 05:08:47,984 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 05:08:48,989 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6150, loss[loss=0.1165, beats_loss=0.01248, ecapa_loss=0.0002215, whisper_loss=0.1018, over 21893.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01149, ecapa_loss=0.0002114, whisper_loss=0.09421, over 3886485.53 frames. ], batch size: 88, lr: 9.05e-03, grad_scale: 2251799813685248.0 2024-08-11 05:08:49,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=931090.0, ans=0.07 2024-08-11 05:08:56,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=931090.0, ans=0.0 2024-08-11 05:09:30,861 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-08-11 05:09:37,933 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 05:09:54,754 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6200, loss[loss=0.1293, beats_loss=0.00982, ecapa_loss=0.0002206, whisper_loss=0.1173, over 14745.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01147, ecapa_loss=0.00021, whisper_loss=0.09464, over 3891542.59 frames. ], batch size: 58, lr: 9.05e-03, grad_scale: 2251799813685248.0 2024-08-11 05:10:14,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=931690.0, ans=0.125 2024-08-11 05:10:23,527 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 05:10:25,978 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 05:10:37,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=931890.0, ans=0.125 2024-08-11 05:10:46,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=931990.0, ans=0.125 2024-08-11 05:10:48,196 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.318e+01 2.725e+01 3.050e+01 3.372e+01 5.411e+01, threshold=6.100e+01, percent-clipped=0.0 2024-08-11 05:10:52,521 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-11 05:10:58,688 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.29 vs. limit=12.0 2024-08-11 05:11:00,393 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6250, loss[loss=0.1145, beats_loss=0.01139, ecapa_loss=0.0002046, whisper_loss=0.1011, over 22331.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01141, ecapa_loss=0.0002093, whisper_loss=0.09479, over 3888826.98 frames. ], batch size: 89, lr: 9.05e-03, grad_scale: 2251799813685248.0 2024-08-11 05:11:16,699 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-11 05:11:42,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=932390.0, ans=0.1 2024-08-11 05:11:44,275 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.10 vs. limit=22.5 2024-08-11 05:11:59,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=932490.0, ans=0.025 2024-08-11 05:12:05,551 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6300, loss[loss=0.09294, beats_loss=0.01234, ecapa_loss=0.0002204, whisper_loss=0.0784, over 14794.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01146, ecapa_loss=0.0002087, whisper_loss=0.09432, over 3855599.76 frames. ], batch size: 62, lr: 9.04e-03, grad_scale: 2251799813685248.0 2024-08-11 05:12:21,968 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 28 from LS+wenet, 21 from Vox, 14 fro AS 2024-08-11 05:12:28,809 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.63 vs. limit=22.5 2024-08-11 05:12:36,158 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 05:12:42,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=932790.0, ans=0.2 2024-08-11 05:12:59,666 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.671e+01 3.003e+01 3.406e+01 5.856e+01, threshold=6.007e+01, percent-clipped=0.0 2024-08-11 05:13:01,151 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 05:13:05,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=932990.0, ans=0.0 2024-08-11 05:13:08,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=932990.0, ans=0.0 2024-08-11 05:13:11,579 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6350, loss[loss=0.1167, beats_loss=0.01069, ecapa_loss=0.0002241, whisper_loss=0.1037, over 19774.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01139, ecapa_loss=0.000208, whisper_loss=0.09421, over 3864834.22 frames. ], batch size: 78, lr: 9.04e-03, grad_scale: 2251799813685248.0 2024-08-11 05:13:13,033 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 05:13:24,107 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 05:13:32,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=933190.0, ans=0.0 2024-08-11 05:13:46,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=933290.0, ans=0.125 2024-08-11 05:13:48,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=933290.0, ans=0.0 2024-08-11 05:13:54,460 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-11 05:14:00,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=933390.0, ans=0.125 2024-08-11 05:14:05,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=933390.0, ans=0.0 2024-08-11 05:14:21,229 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6400, loss[loss=0.1248, beats_loss=0.008269, ecapa_loss=0.0002409, whisper_loss=0.1141, over 23029.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01138, ecapa_loss=0.0002089, whisper_loss=0.09445, over 3880735.38 frames. ], batch size: 89, lr: 9.04e-03, grad_scale: 2251799813685248.0 2024-08-11 05:14:36,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=933690.0, ans=0.0 2024-08-11 05:14:47,602 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 31 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 05:14:57,058 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.34 vs. limit=15.0 2024-08-11 05:14:58,924 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.03 vs. limit=15.0 2024-08-11 05:15:05,054 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.75 vs. limit=10.0 2024-08-11 05:15:08,736 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2024-08-11 05:15:09,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=933890.0, ans=0.125 2024-08-11 05:15:13,509 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 05:15:17,306 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.766e+01 3.115e+01 3.539e+01 7.313e+01, threshold=6.229e+01, percent-clipped=3.0 2024-08-11 05:15:20,284 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 05:15:29,549 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6450, loss[loss=0.0872, beats_loss=0.01342, ecapa_loss=0.0001863, whisper_loss=0.07192, over 21035.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01139, ecapa_loss=0.0002083, whisper_loss=0.09464, over 3902522.15 frames. ], batch size: 87, lr: 9.04e-03, grad_scale: 2251799813685248.0 2024-08-11 05:15:37,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=934090.0, ans=0.1 2024-08-11 05:15:54,723 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.55 vs. limit=15.0 2024-08-11 05:16:06,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=934290.0, ans=0.1 2024-08-11 05:16:07,096 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.32 vs. limit=6.0 2024-08-11 05:16:08,191 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 05:16:18,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=934390.0, ans=0.125 2024-08-11 05:16:21,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=934390.0, ans=0.2 2024-08-11 05:16:33,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=934490.0, ans=0.1 2024-08-11 05:16:38,614 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 05:16:42,908 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6500, loss[loss=0.1196, beats_loss=0.01036, ecapa_loss=0.0002464, whisper_loss=0.1068, over 21720.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01136, ecapa_loss=0.0002091, whisper_loss=0.09487, over 3915741.90 frames. ], batch size: 90, lr: 9.03e-03, grad_scale: 2251799813685248.0 2024-08-11 05:16:52,312 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 05:17:09,810 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 05:17:20,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=934790.0, ans=0.0 2024-08-11 05:17:32,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=934890.0, ans=0.125 2024-08-11 05:17:42,410 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.816e+01 3.248e+01 3.661e+01 5.361e+01, threshold=6.497e+01, percent-clipped=0.0 2024-08-11 05:17:42,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=934990.0, ans=0.1 2024-08-11 05:17:43,824 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 15 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 05:17:44,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=934990.0, ans=0.0 2024-08-11 05:17:45,033 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 22 from LS+wenet, 29 from Vox, 45 fro AS 2024-08-11 05:17:49,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=934990.0, ans=0.0 2024-08-11 05:17:52,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=934990.0, ans=0.125 2024-08-11 05:17:55,960 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6550, loss[loss=0.1089, beats_loss=0.01273, ecapa_loss=0.0001767, whisper_loss=0.09436, over 19404.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01147, ecapa_loss=0.0002078, whisper_loss=0.09467, over 3944621.93 frames. ], batch size: 78, lr: 9.03e-03, grad_scale: 2251799813685248.0 2024-08-11 05:18:02,515 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 05:18:02,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=935090.0, ans=0.2 2024-08-11 05:18:16,497 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-11 05:18:29,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=935290.0, ans=0.0 2024-08-11 05:18:59,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=935490.0, ans=0.09899494936611666 2024-08-11 05:19:05,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=935490.0, ans=0.2 2024-08-11 05:19:11,368 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6600, loss[loss=0.09593, beats_loss=0.01267, ecapa_loss=0.0002029, whisper_loss=0.08123, over 21123.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01142, ecapa_loss=0.000209, whisper_loss=0.09466, over 3957869.89 frames. ], batch size: 89, lr: 9.03e-03, grad_scale: 2251799813685248.0 2024-08-11 05:19:22,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=935590.0, ans=0.125 2024-08-11 05:19:39,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=935790.0, ans=0.1 2024-08-11 05:19:53,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=935790.0, ans=0.125 2024-08-11 05:19:54,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=935890.0, ans=0.125 2024-08-11 05:19:59,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=935890.0, ans=0.0 2024-08-11 05:20:04,375 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 21 from LS+wenet, 29 from Vox, 44 fro AS 2024-08-11 05:20:06,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=935890.0, ans=0.0 2024-08-11 05:20:11,828 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.766e+01 3.102e+01 3.582e+01 5.637e+01, threshold=6.205e+01, percent-clipped=0.0 2024-08-11 05:20:16,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=935990.0, ans=0.1 2024-08-11 05:20:24,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=936090.0, ans=0.125 2024-08-11 05:20:25,128 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6650, loss[loss=0.1147, beats_loss=0.01094, ecapa_loss=0.0001859, whisper_loss=0.1019, over 18685.00 frames. ], tot_loss[loss=0.108, beats_loss=0.0115, ecapa_loss=0.0002085, whisper_loss=0.09446, over 3970908.25 frames. ], batch size: 71, lr: 9.03e-03, grad_scale: 2251799813685248.0 2024-08-11 05:20:32,935 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 05:20:38,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=936090.0, ans=0.1 2024-08-11 05:20:43,174 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2024-08-11 05:20:46,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=936190.0, ans=0.125 2024-08-11 05:21:06,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=936290.0, ans=0.0 2024-08-11 05:21:09,775 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 05:21:14,839 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.14 vs. limit=12.0 2024-08-11 05:21:32,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=936490.0, ans=0.07 2024-08-11 05:21:34,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=936490.0, ans=0.125 2024-08-11 05:21:40,343 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6700, loss[loss=0.1015, beats_loss=0.009744, ecapa_loss=0.00022, whisper_loss=0.08953, over 16473.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01152, ecapa_loss=0.0002083, whisper_loss=0.094, over 3917482.08 frames. ], batch size: 64, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:21:40,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=936590.0, ans=0.2 2024-08-11 05:21:45,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=936590.0, ans=0.125 2024-08-11 05:21:47,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=936590.0, ans=0.125 2024-08-11 05:21:57,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=936690.0, ans=0.0 2024-08-11 05:22:03,264 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 05:22:03,612 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 05:22:10,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=936790.0, ans=0.2 2024-08-11 05:22:10,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=936790.0, ans=0.125 2024-08-11 05:22:36,402 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-11 05:22:39,347 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.752e+01 3.187e+01 3.868e+01 6.125e+01, threshold=6.373e+01, percent-clipped=0.0 2024-08-11 05:22:52,977 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6750, loss[loss=0.07164, beats_loss=0.01344, ecapa_loss=0.0001824, whisper_loss=0.05637, over 18858.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01158, ecapa_loss=0.0002082, whisper_loss=0.09341, over 3904513.21 frames. ], batch size: 77, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:22:54,331 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 38 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 05:22:56,613 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=15.0 2024-08-11 05:23:07,720 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-11 05:23:10,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=937190.0, ans=0.125 2024-08-11 05:23:11,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=937190.0, ans=0.125 2024-08-11 05:23:16,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=937190.0, ans=0.125 2024-08-11 05:23:31,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=937290.0, ans=0.1 2024-08-11 05:23:44,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=937390.0, ans=0.1 2024-08-11 05:23:49,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=937490.0, ans=0.0 2024-08-11 05:23:56,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=937490.0, ans=0.125 2024-08-11 05:24:06,323 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6800, loss[loss=0.09967, beats_loss=0.01082, ecapa_loss=0.0002312, whisper_loss=0.08654, over 21065.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01153, ecapa_loss=0.0002088, whisper_loss=0.09352, over 3920467.53 frames. ], batch size: 88, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:24:22,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=937690.0, ans=0.0 2024-08-11 05:24:27,244 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.97 vs. limit=10.0 2024-08-11 05:24:32,616 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 12 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 05:24:41,011 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 05:25:05,450 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.742e+01 3.088e+01 3.392e+01 5.512e+01, threshold=6.176e+01, percent-clipped=0.0 2024-08-11 05:25:10,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=937990.0, ans=0.125 2024-08-11 05:25:18,772 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6850, loss[loss=0.1026, beats_loss=0.01229, ecapa_loss=0.0002047, whisper_loss=0.08829, over 18094.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.0115, ecapa_loss=0.0002095, whisper_loss=0.09265, over 3898230.40 frames. ], batch size: 71, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:25:18,946 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 05:25:33,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.49 vs. limit=15.0 2024-08-11 05:25:55,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=938290.0, ans=0.125 2024-08-11 05:25:59,243 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 13 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 05:26:04,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=938390.0, ans=0.95 2024-08-11 05:26:08,789 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.95 vs. limit=15.0 2024-08-11 05:26:19,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=938490.0, ans=0.2 2024-08-11 05:26:29,809 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 26 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-11 05:26:33,292 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6900, loss[loss=0.09096, beats_loss=0.01409, ecapa_loss=0.0001508, whisper_loss=0.07536, over 21863.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01147, ecapa_loss=0.0002096, whisper_loss=0.09262, over 3853917.47 frames. ], batch size: 84, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:26:56,516 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2024-08-11 05:26:58,184 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.31 vs. limit=22.5 2024-08-11 05:27:01,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=938690.0, ans=0.0 2024-08-11 05:27:12,466 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.35 vs. limit=10.0 2024-08-11 05:27:18,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=938890.0, ans=0.1 2024-08-11 05:27:21,595 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.38 vs. limit=15.0 2024-08-11 05:27:25,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=938890.0, ans=0.1 2024-08-11 05:27:26,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=30.85 vs. limit=22.5 2024-08-11 05:27:34,739 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.641e+01 3.049e+01 3.440e+01 6.351e+01, threshold=6.099e+01, percent-clipped=1.0 2024-08-11 05:27:40,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=938990.0, ans=0.0 2024-08-11 05:27:48,630 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 6950, loss[loss=0.08754, beats_loss=0.01388, ecapa_loss=0.0002205, whisper_loss=0.07145, over 20376.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0115, ecapa_loss=0.0002084, whisper_loss=0.09284, over 3876313.97 frames. ], batch size: 88, lr: 9.01e-03, grad_scale: 2251799813685248.0 2024-08-11 05:27:51,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=939090.0, ans=0.1 2024-08-11 05:28:00,839 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.86 vs. limit=15.0 2024-08-11 05:28:22,494 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.39 vs. limit=22.5 2024-08-11 05:28:29,103 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 23 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-11 05:28:48,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=939490.0, ans=0.125 2024-08-11 05:28:52,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=939490.0, ans=0.0 2024-08-11 05:28:54,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=939490.0, ans=0.2 2024-08-11 05:29:01,207 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7000, loss[loss=0.1084, beats_loss=0.01307, ecapa_loss=0.0001605, whisper_loss=0.09371, over 22941.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01148, ecapa_loss=0.0002091, whisper_loss=0.09309, over 3876197.91 frames. ], batch size: 91, lr: 9.01e-03, grad_scale: 2251799813685248.0 2024-08-11 05:29:11,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=939590.0, ans=0.125 2024-08-11 05:29:14,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=939690.0, ans=0.125 2024-08-11 05:29:20,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=939690.0, ans=0.07 2024-08-11 05:29:26,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=939690.0, ans=0.125 2024-08-11 05:29:31,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=939790.0, ans=0.2 2024-08-11 05:29:38,700 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 05:29:42,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=939790.0, ans=0.0 2024-08-11 05:29:49,095 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-11 05:29:59,844 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.694e+01 2.915e+01 3.195e+01 8.375e+01, threshold=5.830e+01, percent-clipped=1.0 2024-08-11 05:30:00,065 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 05:30:01,443 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 05:30:03,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=939990.0, ans=0.125 2024-08-11 05:30:11,985 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7050, loss[loss=0.1018, beats_loss=0.01148, ecapa_loss=0.0001732, whisper_loss=0.08854, over 17757.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01151, ecapa_loss=0.0002089, whisper_loss=0.09249, over 3881029.80 frames. ], batch size: 70, lr: 9.01e-03, grad_scale: 4503599627370496.0 2024-08-11 05:30:22,039 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 05:30:32,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=940190.0, ans=0.125 2024-08-11 05:30:35,924 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.38 vs. limit=15.0 2024-08-11 05:30:43,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=940290.0, ans=0.0 2024-08-11 05:31:01,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=940390.0, ans=0.1 2024-08-11 05:31:22,142 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7100, loss[loss=0.1368, beats_loss=0.008892, ecapa_loss=0.0001958, whisper_loss=0.1259, over 20141.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01147, ecapa_loss=0.0002079, whisper_loss=0.09295, over 3881281.05 frames. ], batch size: 76, lr: 9.01e-03, grad_scale: 4503599627370496.0 2024-08-11 05:31:22,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=940590.0, ans=0.1 2024-08-11 05:31:31,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=940590.0, ans=0.0 2024-08-11 05:31:31,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=940590.0, ans=0.2 2024-08-11 05:31:36,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=940690.0, ans=0.95 2024-08-11 05:31:42,432 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 05:31:54,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=940790.0, ans=0.125 2024-08-11 05:32:00,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=940790.0, ans=0.125 2024-08-11 05:32:06,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=940890.0, ans=0.1 2024-08-11 05:32:09,995 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.56 vs. limit=12.0 2024-08-11 05:32:11,786 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 05:32:20,672 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 2.694e+01 2.982e+01 3.283e+01 5.309e+01, threshold=5.963e+01, percent-clipped=0.0 2024-08-11 05:32:28,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=940990.0, ans=0.0 2024-08-11 05:32:31,418 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 05:32:33,904 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7150, loss[loss=0.1172, beats_loss=0.01223, ecapa_loss=0.0001965, whisper_loss=0.103, over 20370.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01141, ecapa_loss=0.0002087, whisper_loss=0.09383, over 3880895.90 frames. ], batch size: 83, lr: 9.00e-03, grad_scale: 4503599627370496.0 2024-08-11 05:32:39,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=941090.0, ans=0.125 2024-08-11 05:32:50,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=941190.0, ans=0.125 2024-08-11 05:32:53,054 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 05:32:53,783 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=22.5 2024-08-11 05:32:58,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=941190.0, ans=0.0 2024-08-11 05:33:15,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=941290.0, ans=0.125 2024-08-11 05:33:36,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=941490.0, ans=0.1 2024-08-11 05:33:38,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=941490.0, ans=0.0 2024-08-11 05:33:40,219 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 18 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-11 05:33:40,504 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.285e+03 2024-08-11 05:33:44,798 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 05:33:45,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=941490.0, ans=0.125 2024-08-11 05:33:50,102 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7200, loss[loss=0.08637, beats_loss=0.01336, ecapa_loss=0.000188, whisper_loss=0.07113, over 22189.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01145, ecapa_loss=0.0002083, whisper_loss=0.09355, over 3886703.93 frames. ], batch size: 94, lr: 9.00e-03, grad_scale: 4503599627370496.0 2024-08-11 05:33:50,925 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=12.0 2024-08-11 05:33:55,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=941590.0, ans=0.125 2024-08-11 05:34:05,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=941690.0, ans=0.0 2024-08-11 05:34:06,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=941690.0, ans=0.125 2024-08-11 05:34:16,896 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 05:34:37,523 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 05:34:46,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=941890.0, ans=0.125 2024-08-11 05:34:53,097 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.691e+01 3.038e+01 3.510e+01 5.388e+01, threshold=6.075e+01, percent-clipped=0.0 2024-08-11 05:35:01,904 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 05:35:06,358 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7250, loss[loss=0.07352, beats_loss=0.01376, ecapa_loss=0.0001992, whisper_loss=0.05776, over 14749.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01144, ecapa_loss=0.0002077, whisper_loss=0.09327, over 3863713.21 frames. ], batch size: 59, lr: 9.00e-03, grad_scale: 4503599627370496.0 2024-08-11 05:35:10,751 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 05:35:24,105 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.21 vs. limit=15.0 2024-08-11 05:35:37,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=942290.0, ans=0.125 2024-08-11 05:35:39,488 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 05:35:44,165 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 05:35:44,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=942290.0, ans=0.0 2024-08-11 05:35:45,102 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.56 vs. limit=10.0 2024-08-11 05:35:51,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=942390.0, ans=0.0 2024-08-11 05:36:00,360 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 8 from Vox, 27 fro AS 2024-08-11 05:36:24,014 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7300, loss[loss=0.1131, beats_loss=0.01047, ecapa_loss=0.0002144, whisper_loss=0.1005, over 22709.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01141, ecapa_loss=0.0002082, whisper_loss=0.0932, over 3842130.66 frames. ], batch size: 94, lr: 9.00e-03, grad_scale: 4503599627370496.0 2024-08-11 05:36:26,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.58 vs. limit=6.0 2024-08-11 05:36:37,556 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-08-11 05:36:41,835 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-11 05:36:42,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=942690.0, ans=0.125 2024-08-11 05:36:46,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=942690.0, ans=0.2 2024-08-11 05:36:52,047 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-11 05:36:59,034 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2024-08-11 05:37:02,409 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.81 vs. limit=10.0 2024-08-11 05:37:15,917 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 05:37:28,118 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.619e+01 2.865e+01 3.274e+01 5.323e+01, threshold=5.731e+01, percent-clipped=0.0 2024-08-11 05:37:30,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=942990.0, ans=0.2 2024-08-11 05:37:40,840 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 05:37:42,663 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7350, loss[loss=0.09712, beats_loss=0.01385, ecapa_loss=0.0002039, whisper_loss=0.08124, over 21653.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01139, ecapa_loss=0.0002085, whisper_loss=0.0931, over 3841069.13 frames. ], batch size: 90, lr: 8.99e-03, grad_scale: 4503599627370496.0 2024-08-11 05:37:56,500 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 05:37:56,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=943090.0, ans=0.5 2024-08-11 05:38:16,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=943290.0, ans=0.1 2024-08-11 05:38:19,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=943290.0, ans=0.125 2024-08-11 05:38:28,329 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=12.0 2024-08-11 05:38:36,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=943390.0, ans=0.125 2024-08-11 05:38:49,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=943490.0, ans=0.125 2024-08-11 05:38:55,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=943490.0, ans=0.125 2024-08-11 05:39:04,143 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7400, loss[loss=0.1075, beats_loss=0.01122, ecapa_loss=0.0002098, whisper_loss=0.09418, over 22506.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01139, ecapa_loss=0.0002087, whisper_loss=0.09378, over 3859196.44 frames. ], batch size: 91, lr: 8.99e-03, grad_scale: 4503599627370496.0 2024-08-11 05:39:22,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=943690.0, ans=0.125 2024-08-11 05:39:22,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=943690.0, ans=0.025 2024-08-11 05:39:24,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=943690.0, ans=10.0 2024-08-11 05:39:25,200 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.06 vs. limit=6.0 2024-08-11 05:39:28,926 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2024-08-11 05:39:54,233 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 05:39:55,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=943890.0, ans=0.125 2024-08-11 05:39:58,794 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 16 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-11 05:40:00,571 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-11 05:40:12,240 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.754e+01 3.134e+01 3.578e+01 6.308e+01, threshold=6.268e+01, percent-clipped=2.0 2024-08-11 05:40:25,917 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 05:40:27,712 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7450, loss[loss=0.1001, beats_loss=0.01158, ecapa_loss=0.0001514, whisper_loss=0.08697, over 16795.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01137, ecapa_loss=0.0002083, whisper_loss=0.09408, over 3868957.55 frames. ], batch size: 64, lr: 8.99e-03, grad_scale: 4503599627370496.0 2024-08-11 05:40:42,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=944190.0, ans=0.0 2024-08-11 05:40:44,370 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.61 vs. limit=10.0 2024-08-11 05:40:50,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=944190.0, ans=0.125 2024-08-11 05:41:04,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=944290.0, ans=0.125 2024-08-11 05:41:32,736 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 05:41:36,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=944490.0, ans=0.2 2024-08-11 05:41:43,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.29 vs. limit=15.0 2024-08-11 05:41:48,335 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 19 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 05:41:48,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=944490.0, ans=0.125 2024-08-11 05:41:50,889 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7500, loss[loss=0.1114, beats_loss=0.01108, ecapa_loss=0.0001952, whisper_loss=0.09835, over 22306.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01141, ecapa_loss=0.0002082, whisper_loss=0.0937, over 3856589.66 frames. ], batch size: 88, lr: 8.99e-03, grad_scale: 4503599627370496.0 2024-08-11 05:41:57,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=944590.0, ans=0.07 2024-08-11 05:42:09,391 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 05:42:21,110 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2024-08-11 05:42:33,935 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.41 vs. limit=15.0 2024-08-11 05:42:35,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=944790.0, ans=0.125 2024-08-11 05:42:44,010 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.00 vs. limit=15.0 2024-08-11 05:42:54,214 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.583e+01 2.883e+01 3.295e+01 6.050e+01, threshold=5.765e+01, percent-clipped=0.0 2024-08-11 05:43:08,085 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7550, loss[loss=0.1024, beats_loss=0.01284, ecapa_loss=0.000226, whisper_loss=0.08726, over 22209.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01142, ecapa_loss=0.0002084, whisper_loss=0.09326, over 3855237.31 frames. ], batch size: 93, lr: 8.98e-03, grad_scale: 4503599627370496.0 2024-08-11 05:43:15,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=945090.0, ans=0.0 2024-08-11 05:44:03,774 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-11 05:44:08,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=945490.0, ans=0.0 2024-08-11 05:44:17,618 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 05:44:22,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=945490.0, ans=0.0 2024-08-11 05:44:25,163 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7600, loss[loss=0.08898, beats_loss=0.01403, ecapa_loss=0.0001968, whisper_loss=0.07298, over 16149.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01148, ecapa_loss=0.0002083, whisper_loss=0.09261, over 3861935.05 frames. ], batch size: 68, lr: 8.98e-03, grad_scale: 4503599627370496.0 2024-08-11 05:45:03,416 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.37 vs. limit=10.0 2024-08-11 05:45:27,872 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.611e+01 2.976e+01 3.513e+01 5.739e+01, threshold=5.952e+01, percent-clipped=0.0 2024-08-11 05:45:28,051 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 05:45:31,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=945990.0, ans=0.125 2024-08-11 05:45:33,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=945990.0, ans=0.125 2024-08-11 05:45:41,036 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7650, loss[loss=0.08938, beats_loss=0.01148, ecapa_loss=0.0002596, whisper_loss=0.07531, over 17011.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01149, ecapa_loss=0.0002079, whisper_loss=0.09252, over 3878614.73 frames. ], batch size: 73, lr: 8.98e-03, grad_scale: 4503599627370496.0 2024-08-11 05:45:48,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=946090.0, ans=0.0 2024-08-11 05:46:31,764 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 05:46:49,087 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 05:46:52,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=946490.0, ans=0.125 2024-08-11 05:46:54,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=946490.0, ans=0.1 2024-08-11 05:46:54,761 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.17 vs. limit=15.0 2024-08-11 05:46:58,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7700, loss[loss=0.1132, beats_loss=0.01189, ecapa_loss=0.0002091, whisper_loss=0.09921, over 18444.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01151, ecapa_loss=0.0002076, whisper_loss=0.09256, over 3888076.33 frames. ], batch size: 75, lr: 8.98e-03, grad_scale: 4503599627370496.0 2024-08-11 05:47:12,369 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 05:47:32,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=946790.0, ans=0.1 2024-08-11 05:47:33,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=946790.0, ans=0.125 2024-08-11 05:47:59,199 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.29 vs. limit=15.0 2024-08-11 05:48:03,344 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.753e+01 2.991e+01 3.515e+01 5.898e+01, threshold=5.981e+01, percent-clipped=0.0 2024-08-11 05:48:17,921 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7750, loss[loss=0.08577, beats_loss=0.01357, ecapa_loss=0.0001749, whisper_loss=0.07044, over 20411.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0115, ecapa_loss=0.0002069, whisper_loss=0.09245, over 3897375.00 frames. ], batch size: 84, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:48:22,875 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 05:48:40,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=947190.0, ans=0.0 2024-08-11 05:48:40,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=947190.0, ans=0.125 2024-08-11 05:48:51,129 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 05:49:02,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=947290.0, ans=0.125 2024-08-11 05:49:07,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=947390.0, ans=0.0 2024-08-11 05:49:24,427 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2024-08-11 05:49:36,165 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7800, loss[loss=0.09196, beats_loss=0.01083, ecapa_loss=0.0002669, whisper_loss=0.07846, over 19627.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01144, ecapa_loss=0.000207, whisper_loss=0.09264, over 3905260.44 frames. ], batch size: 84, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:49:38,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=947590.0, ans=0.1 2024-08-11 05:50:00,544 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 05:50:04,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=947690.0, ans=0.1 2024-08-11 05:50:11,380 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 13 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 05:50:17,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=947790.0, ans=0.2 2024-08-11 05:50:22,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=947890.0, ans=0.0 2024-08-11 05:50:25,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=947890.0, ans=0.125 2024-08-11 05:50:34,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=947890.0, ans=0.125 2024-08-11 05:50:39,531 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.753e+01 3.128e+01 3.537e+01 5.360e+01, threshold=6.257e+01, percent-clipped=0.0 2024-08-11 05:50:53,099 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7850, loss[loss=0.08868, beats_loss=0.01212, ecapa_loss=0.0001982, whisper_loss=0.07458, over 17158.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0115, ecapa_loss=0.0002057, whisper_loss=0.09296, over 3904574.41 frames. ], batch size: 68, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:50:53,369 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 05:50:55,210 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-11 05:51:06,084 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-08-11 05:51:08,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=948190.0, ans=0.125 2024-08-11 05:51:13,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=948190.0, ans=0.0 2024-08-11 05:51:25,665 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2024-08-11 05:51:28,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=948290.0, ans=0.125 2024-08-11 05:51:30,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=948290.0, ans=0.125 2024-08-11 05:51:46,351 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.10 vs. limit=22.5 2024-08-11 05:52:09,710 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7900, loss[loss=0.1476, beats_loss=0.008561, ecapa_loss=0.0002131, whisper_loss=0.1369, over 19754.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01147, ecapa_loss=0.000207, whisper_loss=0.09298, over 3890405.86 frames. ], batch size: 71, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:52:09,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=948590.0, ans=0.125 2024-08-11 05:52:15,163 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-11 05:52:21,695 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2024-08-11 05:52:36,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=948690.0, ans=15.0 2024-08-11 05:52:38,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=948690.0, ans=0.125 2024-08-11 05:52:46,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=948790.0, ans=0.125 2024-08-11 05:52:57,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=948890.0, ans=0.95 2024-08-11 05:53:03,901 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.073e+00 2024-08-11 05:53:14,404 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.621e+01 3.000e+01 3.506e+01 5.251e+01, threshold=6.001e+01, percent-clipped=0.0 2024-08-11 05:53:15,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.36 vs. limit=8.0 2024-08-11 05:53:21,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=948990.0, ans=0.125 2024-08-11 05:53:29,017 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 7950, loss[loss=0.08971, beats_loss=0.01181, ecapa_loss=0.0002014, whisper_loss=0.07588, over 15333.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01146, ecapa_loss=0.0002066, whisper_loss=0.09275, over 3868020.67 frames. ], batch size: 60, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:53:33,711 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-11 05:53:36,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=949090.0, ans=0.2 2024-08-11 05:53:40,341 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.90 vs. limit=15.0 2024-08-11 05:53:43,188 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 05:54:06,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=949290.0, ans=0.2 2024-08-11 05:54:10,060 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 05:54:20,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=949390.0, ans=0.0 2024-08-11 05:54:28,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=949390.0, ans=0.0 2024-08-11 05:54:35,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=949490.0, ans=0.125 2024-08-11 05:54:41,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=949490.0, ans=0.0 2024-08-11 05:54:50,015 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8000, loss[loss=0.1068, beats_loss=0.01145, ecapa_loss=0.0001933, whisper_loss=0.09338, over 19074.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01148, ecapa_loss=0.0002034, whisper_loss=0.09307, over 3887028.54 frames. ], batch size: 75, lr: 8.96e-03, grad_scale: 4503599627370496.0 2024-08-11 05:55:09,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=949690.0, ans=0.125 2024-08-11 05:55:27,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=949790.0, ans=0.125 2024-08-11 05:55:32,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=949790.0, ans=0.125 2024-08-11 05:55:33,888 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 05:55:54,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=949890.0, ans=0.125 2024-08-11 05:55:58,333 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.705e+01 3.037e+01 3.592e+01 7.289e+01, threshold=6.074e+01, percent-clipped=2.0 2024-08-11 05:56:10,803 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8050, loss[loss=0.1164, beats_loss=0.007693, ecapa_loss=0.000199, whisper_loss=0.1067, over 15993.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01159, ecapa_loss=0.0002036, whisper_loss=0.0921, over 3875135.48 frames. ], batch size: 59, lr: 8.96e-03, grad_scale: 4503599627370496.0 2024-08-11 05:56:23,676 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 05:56:25,572 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.87 vs. limit=15.0 2024-08-11 05:56:33,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=950190.0, ans=0.125 2024-08-11 05:56:55,604 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=12.0 2024-08-11 05:57:24,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=950490.0, ans=0.0 2024-08-11 05:57:28,402 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8100, loss[loss=0.09979, beats_loss=0.01109, ecapa_loss=0.000258, whisper_loss=0.08612, over 19424.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01158, ecapa_loss=0.000204, whisper_loss=0.09145, over 3856434.76 frames. ], batch size: 88, lr: 8.96e-03, grad_scale: 4503599627370496.0 2024-08-11 05:57:29,918 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 05:57:31,768 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2024-08-11 05:57:55,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=950690.0, ans=0.125 2024-08-11 05:58:15,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=950790.0, ans=0.1 2024-08-11 05:58:17,772 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 37 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 05:58:19,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=950890.0, ans=0.025 2024-08-11 05:58:36,874 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.727e+01 3.067e+01 3.354e+01 4.801e+01, threshold=6.134e+01, percent-clipped=0.0 2024-08-11 05:58:38,299 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 05:58:41,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=950990.0, ans=0.0 2024-08-11 05:58:44,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=950990.0, ans=0.0 2024-08-11 05:58:48,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=950990.0, ans=0.125 2024-08-11 05:58:49,877 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 05:58:51,412 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8150, loss[loss=0.09553, beats_loss=0.01127, ecapa_loss=0.0001804, whisper_loss=0.08246, over 14322.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01154, ecapa_loss=0.0002038, whisper_loss=0.09259, over 3885902.25 frames. ], batch size: 56, lr: 8.96e-03, grad_scale: 4503599627370496.0 2024-08-11 05:58:53,781 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.55 vs. limit=15.0 2024-08-11 05:59:16,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=951190.0, ans=0.1 2024-08-11 05:59:19,586 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-11 05:59:50,797 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 24 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-11 05:59:56,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=951490.0, ans=0.0 2024-08-11 05:59:57,739 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 06:00:02,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=951490.0, ans=0.1 2024-08-11 06:00:13,762 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8200, loss[loss=0.1059, beats_loss=0.01227, ecapa_loss=0.0002077, whisper_loss=0.0916, over 16866.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0115, ecapa_loss=0.0002051, whisper_loss=0.09248, over 3883319.49 frames. ], batch size: 65, lr: 8.95e-03, grad_scale: 4503599627370496.0 2024-08-11 06:00:16,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=951590.0, ans=0.0 2024-08-11 06:00:28,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=951690.0, ans=0.125 2024-08-11 06:00:48,345 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.67 vs. limit=15.0 2024-08-11 06:00:51,050 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 06:01:07,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=951890.0, ans=0.0 2024-08-11 06:01:14,518 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 06:01:18,642 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 06:01:19,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=951990.0, ans=0.125 2024-08-11 06:01:19,997 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.662e+01 3.047e+01 3.528e+01 2.595e+02, threshold=6.093e+01, percent-clipped=1.0 2024-08-11 06:01:32,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=951990.0, ans=0.0 2024-08-11 06:01:34,463 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8250, loss[loss=0.08664, beats_loss=0.01383, ecapa_loss=0.0001944, whisper_loss=0.07086, over 21192.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01153, ecapa_loss=0.0002069, whisper_loss=0.09271, over 3902043.38 frames. ], batch size: 87, lr: 8.95e-03, grad_scale: 4503599627370496.0 2024-08-11 06:01:53,948 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 06:01:54,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=952190.0, ans=0.125 2024-08-11 06:02:00,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=952190.0, ans=0.125 2024-08-11 06:02:07,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=952290.0, ans=0.125 2024-08-11 06:02:14,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=952290.0, ans=0.5 2024-08-11 06:02:19,256 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.13 vs. limit=15.0 2024-08-11 06:02:28,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=952390.0, ans=0.125 2024-08-11 06:02:37,910 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=15.0 2024-08-11 06:02:54,275 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8300, loss[loss=0.1098, beats_loss=0.01171, ecapa_loss=0.0002023, whisper_loss=0.09605, over 21946.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01152, ecapa_loss=0.0002075, whisper_loss=0.09227, over 3875919.54 frames. ], batch size: 87, lr: 8.95e-03, grad_scale: 4503599627370496.0 2024-08-11 06:02:55,552 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2024-08-11 06:02:56,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=952590.0, ans=0.1 2024-08-11 06:02:56,512 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.82 vs. limit=22.5 2024-08-11 06:02:58,199 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 13 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 06:03:04,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=952590.0, ans=0.125 2024-08-11 06:03:28,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=952790.0, ans=0.125 2024-08-11 06:03:36,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=952790.0, ans=0.0 2024-08-11 06:03:58,095 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.259e+01 2.727e+01 2.981e+01 3.576e+01 6.756e+01, threshold=5.962e+01, percent-clipped=1.0 2024-08-11 06:04:04,336 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-11 06:04:12,469 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8350, loss[loss=0.09439, beats_loss=0.01493, ecapa_loss=0.0001499, whisper_loss=0.07796, over 19205.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01155, ecapa_loss=0.0002076, whisper_loss=0.09196, over 3892390.14 frames. ], batch size: 76, lr: 8.95e-03, grad_scale: 4503599627370496.0 2024-08-11 06:04:24,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=953090.0, ans=0.125 2024-08-11 06:04:25,470 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-11 06:04:53,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=953290.0, ans=0.125 2024-08-11 06:05:02,921 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 06:05:33,068 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8400, loss[loss=0.08251, beats_loss=0.01375, ecapa_loss=0.0002106, whisper_loss=0.06665, over 21573.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01151, ecapa_loss=0.0002073, whisper_loss=0.0923, over 3912082.39 frames. ], batch size: 89, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:05:36,400 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 06:05:38,351 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 06:05:43,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=953590.0, ans=0.125 2024-08-11 06:05:49,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=953690.0, ans=0.0 2024-08-11 06:06:36,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=953890.0, ans=0.125 2024-08-11 06:06:40,622 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.819e+01 3.267e+01 3.747e+01 3.320e+02, threshold=6.533e+01, percent-clipped=4.0 2024-08-11 06:06:46,715 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 31 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-11 06:06:47,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=953990.0, ans=0.0 2024-08-11 06:06:47,362 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.30 vs. limit=10.0 2024-08-11 06:06:54,877 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8450, loss[loss=0.1034, beats_loss=0.01244, ecapa_loss=0.0001982, whisper_loss=0.08896, over 22539.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01139, ecapa_loss=0.0002083, whisper_loss=0.09314, over 3899738.25 frames. ], batch size: 90, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:07:12,111 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 06:07:20,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=954190.0, ans=0.125 2024-08-11 06:07:43,936 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.15 vs. limit=6.0 2024-08-11 06:07:49,893 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.69 vs. limit=15.0 2024-08-11 06:07:58,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=954390.0, ans=0.1 2024-08-11 06:08:03,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=954490.0, ans=0.125 2024-08-11 06:08:15,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=954490.0, ans=0.0 2024-08-11 06:08:17,986 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8500, loss[loss=0.1035, beats_loss=0.01132, ecapa_loss=0.0001887, whisper_loss=0.09032, over 21354.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01135, ecapa_loss=0.000208, whisper_loss=0.09322, over 3884822.55 frames. ], batch size: 86, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:08:19,792 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-11 06:08:29,338 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 06:08:37,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=954690.0, ans=0.125 2024-08-11 06:09:15,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=954890.0, ans=0.1 2024-08-11 06:09:21,133 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 06:09:22,720 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 18 from LS+wenet, 32 from Vox, 28 fro AS 2024-08-11 06:09:22,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=954990.0, ans=0.125 2024-08-11 06:09:25,941 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.679e+01 3.057e+01 3.369e+01 5.558e+01, threshold=6.114e+01, percent-clipped=0.0 2024-08-11 06:09:32,950 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=15.0 2024-08-11 06:09:39,825 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8550, loss[loss=0.1378, beats_loss=0.009949, ecapa_loss=0.0001789, whisper_loss=0.126, over 18029.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.0114, ecapa_loss=0.0002061, whisper_loss=0.09283, over 3877376.64 frames. ], batch size: 68, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:09:39,984 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 06:09:48,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=955090.0, ans=0.125 2024-08-11 06:09:49,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=955090.0, ans=0.1 2024-08-11 06:10:15,626 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 38 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 06:10:35,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=955390.0, ans=0.0 2024-08-11 06:10:47,080 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-11 06:10:50,076 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 06:10:53,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=955490.0, ans=0.1 2024-08-11 06:11:05,354 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8600, loss[loss=0.1097, beats_loss=0.0101, ecapa_loss=0.0002225, whisper_loss=0.09738, over 17622.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01145, ecapa_loss=0.0002065, whisper_loss=0.09307, over 3864324.41 frames. ], batch size: 70, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:11:13,031 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 06:11:17,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=955590.0, ans=0.125 2024-08-11 06:11:30,466 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-11 06:11:56,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=955890.0, ans=0.125 2024-08-11 06:12:11,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=955990.0, ans=0.05 2024-08-11 06:12:14,030 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.782e+01 3.171e+01 3.818e+01 6.085e+01, threshold=6.342e+01, percent-clipped=0.0 2024-08-11 06:12:25,969 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 06:12:28,196 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.76 vs. limit=22.5 2024-08-11 06:12:28,975 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8650, loss[loss=0.1351, beats_loss=0.01072, ecapa_loss=0.0002021, whisper_loss=0.1223, over 21270.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01143, ecapa_loss=0.0002091, whisper_loss=0.09267, over 3855375.19 frames. ], batch size: 81, lr: 8.93e-03, grad_scale: 4503599627370496.0 2024-08-11 06:12:34,116 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 06:12:43,196 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 06:12:46,272 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-11 06:12:49,010 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.15 vs. limit=6.0 2024-08-11 06:12:57,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=956190.0, ans=0.1 2024-08-11 06:13:05,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=956290.0, ans=0.125 2024-08-11 06:13:40,204 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2024-08-11 06:13:40,248 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-08-11 06:13:47,233 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 06:13:52,071 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8700, loss[loss=0.1223, beats_loss=0.009841, ecapa_loss=0.0001533, whisper_loss=0.1109, over 14841.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01141, ecapa_loss=0.0002083, whisper_loss=0.09323, over 3826639.23 frames. ], batch size: 54, lr: 8.93e-03, grad_scale: 4503599627370496.0 2024-08-11 06:13:52,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=956590.0, ans=0.125 2024-08-11 06:14:04,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=956590.0, ans=0.0 2024-08-11 06:14:26,767 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 06:14:27,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=956790.0, ans=0.0 2024-08-11 06:14:29,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=956790.0, ans=0.125 2024-08-11 06:14:29,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=956790.0, ans=0.1 2024-08-11 06:14:31,088 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.11 vs. limit=10.0 2024-08-11 06:14:43,545 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 16 from LS+wenet, 30 from Vox, 44 fro AS 2024-08-11 06:14:54,093 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.82 vs. limit=15.0 2024-08-11 06:14:54,934 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 06:14:57,438 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.743e+01 3.051e+01 3.561e+01 4.836e+01, threshold=6.102e+01, percent-clipped=0.0 2024-08-11 06:15:07,032 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-11 06:15:11,980 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8750, loss[loss=0.09614, beats_loss=0.01017, ecapa_loss=0.000254, whisper_loss=0.08343, over 15988.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01134, ecapa_loss=0.0002088, whisper_loss=0.09385, over 3830438.62 frames. ], batch size: 64, lr: 8.93e-03, grad_scale: 4503599627370496.0 2024-08-11 06:15:48,896 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 06:15:49,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=957290.0, ans=0.1 2024-08-11 06:16:00,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=957390.0, ans=0.0 2024-08-11 06:16:05,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=957390.0, ans=0.0 2024-08-11 06:16:06,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=957390.0, ans=0.025 2024-08-11 06:16:12,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=957390.0, ans=0.0 2024-08-11 06:16:20,097 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 24 from LS+wenet, 13 from Vox, 18 fro AS 2024-08-11 06:16:29,515 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8800, loss[loss=0.1239, beats_loss=0.01039, ecapa_loss=0.000181, whisper_loss=0.1117, over 16892.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01138, ecapa_loss=0.0002081, whisper_loss=0.09399, over 3850525.95 frames. ], batch size: 61, lr: 8.93e-03, grad_scale: 4503599627370496.0 2024-08-11 06:16:47,890 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-08-11 06:16:49,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=957690.0, ans=0.0 2024-08-11 06:16:58,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=957790.0, ans=0.0 2024-08-11 06:17:13,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=957790.0, ans=0.125 2024-08-11 06:17:33,933 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.553e+01 2.761e+01 3.256e+01 4.911e+01, threshold=5.522e+01, percent-clipped=0.0 2024-08-11 06:17:36,302 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.249e-03 2024-08-11 06:17:49,413 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8850, loss[loss=0.09865, beats_loss=0.01162, ecapa_loss=0.0002145, whisper_loss=0.08488, over 16807.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01144, ecapa_loss=0.0002061, whisper_loss=0.09404, over 3872878.24 frames. ], batch size: 67, lr: 8.92e-03, grad_scale: 4503599627370496.0 2024-08-11 06:17:54,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=958090.0, ans=0.04949747468305833 2024-08-11 06:18:11,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=958190.0, ans=0.125 2024-08-11 06:18:11,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=958190.0, ans=0.0 2024-08-11 06:18:26,805 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 06:18:29,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=958290.0, ans=15.0 2024-08-11 06:18:29,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=958290.0, ans=0.1 2024-08-11 06:18:57,365 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 06:19:10,938 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8900, loss[loss=0.09467, beats_loss=0.01387, ecapa_loss=0.0002299, whisper_loss=0.0785, over 23391.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01139, ecapa_loss=0.0002079, whisper_loss=0.09474, over 3859472.15 frames. ], batch size: 93, lr: 8.92e-03, grad_scale: 4503599627370496.0 2024-08-11 06:19:21,944 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 06:19:49,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=958790.0, ans=0.125 2024-08-11 06:20:13,092 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.703e+01 3.133e+01 3.628e+01 5.499e+01, threshold=6.267e+01, percent-clipped=0.0 2024-08-11 06:20:26,391 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 8950, loss[loss=0.1155, beats_loss=0.01015, ecapa_loss=0.0001889, whisper_loss=0.1034, over 18818.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.0114, ecapa_loss=0.0002072, whisper_loss=0.09428, over 3856162.51 frames. ], batch size: 71, lr: 8.92e-03, grad_scale: 4503599627370496.0 2024-08-11 06:20:27,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=959090.0, ans=0.0 2024-08-11 06:20:29,272 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 06:20:41,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=959190.0, ans=0.125 2024-08-11 06:20:45,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=959190.0, ans=0.2 2024-08-11 06:21:01,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=959290.0, ans=0.0 2024-08-11 06:21:22,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=959390.0, ans=0.0 2024-08-11 06:21:29,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=959490.0, ans=0.2 2024-08-11 06:21:34,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=959490.0, ans=0.125 2024-08-11 06:21:39,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=959590.0, ans=0.0 2024-08-11 06:21:40,476 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9000, loss[loss=0.1187, beats_loss=0.01144, ecapa_loss=0.0002012, whisper_loss=0.1052, over 22643.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01136, ecapa_loss=0.0002065, whisper_loss=0.09424, over 3845681.51 frames. ], batch size: 87, lr: 8.92e-03, grad_scale: 4503599627370496.0 2024-08-11 06:21:40,477 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 06:22:22,389 INFO [train_multi_KD3.py:1149] (2/4) Epoch 7, validation on ASR_libri: loss=0.2572, beats_loss=0, ecapa_loss=0.0006695, whisper_loss=0.2505, over 922467.00 frames. 2024-08-11 06:22:40,939 INFO [train_multi_KD3.py:1149] (2/4) Epoch 7, validation on SV_voxceleb1: loss=0.005671, beats_loss=0, ecapa_loss=0.0005671, whisper_loss=0, over 939242.00 frames. 2024-08-11 06:23:54,773 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5508, 1.9976, 1.6488, 1.1632], device='cuda:2') 2024-08-11 06:24:43,664 INFO [train_multi_KD3.py:1149] (2/4) Epoch 7, validation on AT_audioset: loss=0.0256, beats_loss=0.0256, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 06:24:43,669 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 06:25:07,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=959690.0, ans=0.125 2024-08-11 06:25:14,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=959790.0, ans=0.125 2024-08-11 06:25:30,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=959890.0, ans=0.125 2024-08-11 06:25:48,929 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2024-08-11 06:25:49,543 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.675e+01 2.932e+01 3.308e+01 5.321e+01, threshold=5.865e+01, percent-clipped=0.0 2024-08-11 06:25:49,758 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 06:26:02,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=960090.0, ans=0.125 2024-08-11 06:26:03,786 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9050, loss[loss=0.1119, beats_loss=0.01293, ecapa_loss=0.0001674, whisper_loss=0.09733, over 24066.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01132, ecapa_loss=0.0002063, whisper_loss=0.09442, over 3840393.74 frames. ], batch size: 94, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:26:21,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=960190.0, ans=0.0 2024-08-11 06:26:29,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=960190.0, ans=0.125 2024-08-11 06:26:39,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=960190.0, ans=0.125 2024-08-11 06:26:40,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=960290.0, ans=0.0 2024-08-11 06:26:55,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=960290.0, ans=0.125 2024-08-11 06:27:22,025 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 06:27:32,532 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9100, loss[loss=0.1093, beats_loss=0.01314, ecapa_loss=0.0001875, whisper_loss=0.09428, over 17413.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01131, ecapa_loss=0.0002081, whisper_loss=0.09406, over 3842866.64 frames. ], batch size: 71, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:27:35,153 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2024-08-11 06:27:42,342 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 06:27:42,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=960590.0, ans=0.0 2024-08-11 06:27:44,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=960590.0, ans=0.125 2024-08-11 06:27:54,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=960690.0, ans=0.0 2024-08-11 06:28:08,974 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 06:28:09,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=960690.0, ans=0.125 2024-08-11 06:28:15,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=960790.0, ans=0.125 2024-08-11 06:28:30,660 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 06:28:51,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=960990.0, ans=0.125 2024-08-11 06:28:51,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=960990.0, ans=0.04949747468305833 2024-08-11 06:28:52,660 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.825e+01 3.107e+01 3.810e+01 5.498e+01, threshold=6.214e+01, percent-clipped=0.0 2024-08-11 06:29:10,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9150, loss[loss=0.09019, beats_loss=0.0136, ecapa_loss=0.0002152, whisper_loss=0.07444, over 21957.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01138, ecapa_loss=0.0002075, whisper_loss=0.09385, over 3859916.34 frames. ], batch size: 93, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:29:16,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=961090.0, ans=0.1 2024-08-11 06:29:18,272 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 06:29:18,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=961090.0, ans=0.025 2024-08-11 06:29:30,095 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 06:29:38,896 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=12.0 2024-08-11 06:30:25,058 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.43 vs. limit=12.0 2024-08-11 06:30:27,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=961490.0, ans=10.0 2024-08-11 06:30:35,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=961490.0, ans=0.125 2024-08-11 06:30:43,999 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9200, loss[loss=0.1047, beats_loss=0.009008, ecapa_loss=0.0002073, whisper_loss=0.09359, over 19543.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01145, ecapa_loss=0.0002063, whisper_loss=0.09363, over 3891949.11 frames. ], batch size: 75, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:31:27,288 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 23 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-11 06:31:27,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=961790.0, ans=0.125 2024-08-11 06:32:06,581 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.686e+01 3.168e+01 3.590e+01 6.490e+01, threshold=6.336e+01, percent-clipped=1.0 2024-08-11 06:32:14,708 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.84 vs. limit=12.0 2024-08-11 06:32:21,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=961990.0, ans=0.2 2024-08-11 06:32:26,090 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9250, loss[loss=0.1189, beats_loss=0.01108, ecapa_loss=0.0002311, whisper_loss=0.1055, over 22052.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01145, ecapa_loss=0.0002065, whisper_loss=0.09279, over 3912796.11 frames. ], batch size: 91, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:33:06,985 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 06:33:11,995 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.56 vs. limit=12.0 2024-08-11 06:33:15,267 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 06:33:18,918 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.30 vs. limit=15.0 2024-08-11 06:33:29,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=962390.0, ans=0.125 2024-08-11 06:33:43,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=962490.0, ans=0.0 2024-08-11 06:33:49,751 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9300, loss[loss=0.1036, beats_loss=0.01064, ecapa_loss=0.000212, whisper_loss=0.09079, over 23114.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01147, ecapa_loss=0.0002061, whisper_loss=0.09296, over 3908005.79 frames. ], batch size: 91, lr: 8.90e-03, grad_scale: 9007199254740992.0 2024-08-11 06:34:09,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=962690.0, ans=0.0 2024-08-11 06:34:19,198 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.18 vs. limit=10.0 2024-08-11 06:34:50,106 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.791e+01 3.053e+01 3.524e+01 6.115e+01, threshold=6.107e+01, percent-clipped=0.0 2024-08-11 06:35:03,202 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9350, loss[loss=0.1029, beats_loss=0.01154, ecapa_loss=0.0001896, whisper_loss=0.08946, over 17699.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.0114, ecapa_loss=0.0002056, whisper_loss=0.09348, over 3922143.74 frames. ], batch size: 68, lr: 8.90e-03, grad_scale: 9007199254740992.0 2024-08-11 06:35:13,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=963090.0, ans=0.0 2024-08-11 06:35:45,806 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.52 vs. limit=10.0 2024-08-11 06:35:52,000 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 06:35:52,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=963390.0, ans=0.125 2024-08-11 06:35:55,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=963390.0, ans=0.1 2024-08-11 06:36:15,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=963490.0, ans=0.0 2024-08-11 06:36:17,965 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9400, loss[loss=0.08235, beats_loss=0.009953, ecapa_loss=0.0002334, whisper_loss=0.07006, over 15703.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01146, ecapa_loss=0.0002059, whisper_loss=0.09319, over 3931753.98 frames. ], batch size: 63, lr: 8.90e-03, grad_scale: 9007199254740992.0 2024-08-11 06:36:18,155 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 06:36:23,181 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.02 vs. limit=6.0 2024-08-11 06:36:25,259 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 06:36:29,403 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-11 06:36:48,604 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.30 vs. limit=15.0 2024-08-11 06:36:49,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=963790.0, ans=0.125 2024-08-11 06:36:52,805 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.09 vs. limit=22.5 2024-08-11 06:37:00,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=963890.0, ans=0.125 2024-08-11 06:37:08,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=963890.0, ans=0.09899494936611666 2024-08-11 06:37:18,851 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.687e+01 3.013e+01 3.513e+01 7.296e+01, threshold=6.026e+01, percent-clipped=1.0 2024-08-11 06:37:25,559 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 06:37:25,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=963990.0, ans=0.0 2024-08-11 06:37:32,243 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=12.0 2024-08-11 06:37:32,767 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9450, loss[loss=0.09031, beats_loss=0.01127, ecapa_loss=0.0002422, whisper_loss=0.07662, over 21641.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01149, ecapa_loss=0.0002061, whisper_loss=0.09289, over 3910831.45 frames. ], batch size: 92, lr: 8.90e-03, grad_scale: 9007199254740992.0 2024-08-11 06:37:59,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=964190.0, ans=0.2 2024-08-11 06:38:06,487 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 06:38:08,600 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-08-11 06:38:10,013 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.78 vs. limit=15.0 2024-08-11 06:38:27,109 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 35 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-11 06:38:34,893 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-11 06:38:45,160 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 06:38:48,798 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9500, loss[loss=0.1254, beats_loss=0.01066, ecapa_loss=0.0001866, whisper_loss=0.1129, over 16836.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01147, ecapa_loss=0.0002067, whisper_loss=0.09292, over 3895545.99 frames. ], batch size: 65, lr: 8.89e-03, grad_scale: 9007199254740992.0 2024-08-11 06:38:51,744 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 06:38:51,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=964590.0, ans=0.0 2024-08-11 06:38:57,070 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.87 vs. limit=15.0 2024-08-11 06:39:04,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=964690.0, ans=0.1 2024-08-11 06:39:09,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=964690.0, ans=0.0 2024-08-11 06:39:15,000 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 06:39:37,091 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.31 vs. limit=10.0 2024-08-11 06:39:39,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=964890.0, ans=0.125 2024-08-11 06:39:45,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=964890.0, ans=0.125 2024-08-11 06:39:46,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=964890.0, ans=0.2 2024-08-11 06:39:50,483 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.745e+01 3.159e+01 3.801e+01 1.108e+02, threshold=6.317e+01, percent-clipped=3.0 2024-08-11 06:39:50,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=964990.0, ans=0.1 2024-08-11 06:40:03,722 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9550, loss[loss=0.1145, beats_loss=0.01165, ecapa_loss=0.0001629, whisper_loss=0.1012, over 24390.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01148, ecapa_loss=0.0002075, whisper_loss=0.09252, over 3883172.90 frames. ], batch size: 93, lr: 8.89e-03, grad_scale: 9007199254740992.0 2024-08-11 06:40:15,479 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 10 from Vox, 37 fro AS 2024-08-11 06:40:21,360 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 06:40:40,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=965290.0, ans=0.125 2024-08-11 06:40:58,025 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 06:41:05,700 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2024-08-11 06:41:13,984 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9600, loss[loss=0.1137, beats_loss=0.01009, ecapa_loss=0.0002267, whisper_loss=0.1014, over 21824.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01148, ecapa_loss=0.0002075, whisper_loss=0.0932, over 3908233.68 frames. ], batch size: 89, lr: 8.89e-03, grad_scale: 9007199254740992.0 2024-08-11 06:41:19,580 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 06:41:32,895 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-11 06:41:36,180 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 12 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 06:41:45,276 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 06:41:56,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=965890.0, ans=0.09899494936611666 2024-08-11 06:41:56,671 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.64 vs. limit=12.0 2024-08-11 06:42:02,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=965890.0, ans=0.125 2024-08-11 06:42:07,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=965890.0, ans=0.2 2024-08-11 06:42:14,235 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 2.765e+01 3.049e+01 3.383e+01 4.788e+01, threshold=6.099e+01, percent-clipped=0.0 2024-08-11 06:42:28,378 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9650, loss[loss=0.08836, beats_loss=0.01337, ecapa_loss=0.0001858, whisper_loss=0.07314, over 17781.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01144, ecapa_loss=0.0002073, whisper_loss=0.09332, over 3880566.69 frames. ], batch size: 73, lr: 8.89e-03, grad_scale: 9007199254740992.0 2024-08-11 06:42:52,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=966190.0, ans=0.125 2024-08-11 06:43:00,787 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.28 vs. limit=6.0 2024-08-11 06:43:05,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=966290.0, ans=0.0 2024-08-11 06:43:21,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=966390.0, ans=0.1 2024-08-11 06:43:24,665 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=12.0 2024-08-11 06:43:34,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=966490.0, ans=0.125 2024-08-11 06:43:34,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=966490.0, ans=0.125 2024-08-11 06:43:43,375 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9700, loss[loss=0.09454, beats_loss=0.01308, ecapa_loss=0.0001979, whisper_loss=0.07949, over 13929.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01146, ecapa_loss=0.0002097, whisper_loss=0.0929, over 3883062.50 frames. ], batch size: 58, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:43:47,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=966590.0, ans=0.0 2024-08-11 06:44:01,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=966690.0, ans=0.0 2024-08-11 06:44:28,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=966890.0, ans=0.2 2024-08-11 06:44:42,532 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.608e+01 2.892e+01 3.245e+01 5.119e+01, threshold=5.784e+01, percent-clipped=0.0 2024-08-11 06:44:55,496 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9750, loss[loss=0.09658, beats_loss=0.01032, ecapa_loss=0.0002662, whisper_loss=0.0836, over 21212.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01145, ecapa_loss=0.0002102, whisper_loss=0.09304, over 3888177.88 frames. ], batch size: 89, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:45:02,938 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 20 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 06:45:05,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=967090.0, ans=0.0 2024-08-11 06:45:11,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=967190.0, ans=0.125 2024-08-11 06:45:13,839 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 06:45:20,954 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 06:46:07,802 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9800, loss[loss=0.1039, beats_loss=0.0107, ecapa_loss=0.0002658, whisper_loss=0.09053, over 15757.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01152, ecapa_loss=0.0002113, whisper_loss=0.09235, over 3850911.46 frames. ], batch size: 65, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:46:22,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=967690.0, ans=0.2 2024-08-11 06:46:24,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=967690.0, ans=0.125 2024-08-11 06:46:32,770 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.73 vs. limit=22.5 2024-08-11 06:46:39,222 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-11 06:46:41,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=967790.0, ans=0.125 2024-08-11 06:46:42,925 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2024-08-11 06:46:44,086 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.64 vs. limit=15.0 2024-08-11 06:47:06,615 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.643e+01 2.929e+01 3.455e+01 6.415e+01, threshold=5.858e+01, percent-clipped=3.0 2024-08-11 06:47:19,875 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9850, loss[loss=0.1347, beats_loss=0.01058, ecapa_loss=0.0001861, whisper_loss=0.1223, over 24076.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01149, ecapa_loss=0.0002108, whisper_loss=0.09307, over 3858832.19 frames. ], batch size: 94, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:47:35,899 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2024-08-11 06:47:41,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=968190.0, ans=0.125 2024-08-11 06:47:57,270 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 29 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-11 06:48:11,614 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 06:48:22,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=968490.0, ans=0.09899494936611666 2024-08-11 06:48:28,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=968490.0, ans=0.125 2024-08-11 06:48:31,846 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 06:48:34,797 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9900, loss[loss=0.09454, beats_loss=0.01264, ecapa_loss=0.0002106, whisper_loss=0.0798, over 23144.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01145, ecapa_loss=0.00021, whisper_loss=0.0933, over 3865449.81 frames. ], batch size: 92, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:48:42,269 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-11 06:48:54,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.44 vs. limit=15.0 2024-08-11 06:48:54,950 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 06:48:59,139 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 06:49:03,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=968790.0, ans=0.125 2024-08-11 06:49:07,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=968790.0, ans=0.2 2024-08-11 06:49:09,940 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-11 06:49:13,699 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-08-11 06:49:20,172 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2024-08-11 06:49:32,113 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.797e+01 3.066e+01 3.610e+01 6.025e+01, threshold=6.133e+01, percent-clipped=2.0 2024-08-11 06:49:35,671 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.26 vs. limit=10.0 2024-08-11 06:49:45,266 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 9950, loss[loss=0.1037, beats_loss=0.01259, ecapa_loss=0.0001718, whisper_loss=0.08942, over 19453.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01132, ecapa_loss=0.0002114, whisper_loss=0.09413, over 3879734.94 frames. ], batch size: 76, lr: 8.87e-03, grad_scale: 9007199254740992.0 2024-08-11 06:49:47,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=969090.0, ans=0.0 2024-08-11 06:49:51,148 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 06:50:11,799 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 13 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 06:50:12,125 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.042e-01 2024-08-11 06:50:12,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=969290.0, ans=0.0 2024-08-11 06:50:14,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=969290.0, ans=0.125 2024-08-11 06:50:27,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=969390.0, ans=0.1 2024-08-11 06:50:30,634 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.74 vs. limit=22.5 2024-08-11 06:50:33,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=969390.0, ans=0.0 2024-08-11 06:50:51,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=969490.0, ans=0.0 2024-08-11 06:50:58,314 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10000, loss[loss=0.1178, beats_loss=0.009683, ecapa_loss=0.0001953, whisper_loss=0.1061, over 14494.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01132, ecapa_loss=0.0002093, whisper_loss=0.09449, over 3868019.96 frames. ], batch size: 56, lr: 8.87e-03, grad_scale: 9007199254740992.0 2024-08-11 06:51:00,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=969590.0, ans=0.0 2024-08-11 06:51:00,241 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.33 vs. limit=6.0 2024-08-11 06:51:02,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=969590.0, ans=0.1 2024-08-11 06:51:05,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=969590.0, ans=0.1 2024-08-11 06:51:19,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=969690.0, ans=0.0 2024-08-11 06:51:29,618 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.42 vs. limit=10.0 2024-08-11 06:51:34,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=969790.0, ans=0.1 2024-08-11 06:51:44,033 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-11 06:51:52,763 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2024-08-11 06:51:56,220 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.627e+01 2.974e+01 3.477e+01 5.733e+01, threshold=5.949e+01, percent-clipped=0.0 2024-08-11 06:52:09,047 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10050, loss[loss=0.09124, beats_loss=0.01113, ecapa_loss=0.0002232, whisper_loss=0.07788, over 22268.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01121, ecapa_loss=0.0002101, whisper_loss=0.09498, over 3893262.06 frames. ], batch size: 90, lr: 8.87e-03, grad_scale: 9007199254740992.0 2024-08-11 06:52:31,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=970190.0, ans=0.1 2024-08-11 06:52:46,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=970290.0, ans=0.125 2024-08-11 06:53:15,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=970490.0, ans=0.2 2024-08-11 06:53:18,077 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10100, loss[loss=0.08739, beats_loss=0.01193, ecapa_loss=0.0002782, whisper_loss=0.07268, over 13087.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01128, ecapa_loss=0.000212, whisper_loss=0.09454, over 3888594.44 frames. ], batch size: 57, lr: 8.87e-03, grad_scale: 9007199254740992.0 2024-08-11 06:53:20,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=970590.0, ans=0.05 2024-08-11 06:53:45,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=970790.0, ans=0.125 2024-08-11 06:53:55,205 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2024-08-11 06:54:01,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=970890.0, ans=0.1 2024-08-11 06:54:11,570 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.833e+01 3.189e+01 3.704e+01 6.701e+01, threshold=6.379e+01, percent-clipped=2.0 2024-08-11 06:54:15,569 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 06:54:16,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=970990.0, ans=0.0 2024-08-11 06:54:23,127 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10150, loss[loss=0.09673, beats_loss=0.01366, ecapa_loss=0.0002325, whisper_loss=0.08074, over 15078.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01131, ecapa_loss=0.0002128, whisper_loss=0.09417, over 3894568.12 frames. ], batch size: 63, lr: 8.86e-03, grad_scale: 9007199254740992.0 2024-08-11 06:54:23,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=971090.0, ans=0.1 2024-08-11 06:54:24,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=971090.0, ans=0.0 2024-08-11 06:54:25,854 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 06:54:31,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=971090.0, ans=0.125 2024-08-11 06:54:35,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=971190.0, ans=0.2 2024-08-11 06:54:40,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=971190.0, ans=0.125 2024-08-11 06:54:41,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=971190.0, ans=0.05 2024-08-11 06:54:51,672 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.55 vs. limit=22.5 2024-08-11 06:55:01,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=971390.0, ans=0.1 2024-08-11 06:55:03,697 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 06:55:17,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=971490.0, ans=0.035 2024-08-11 06:55:21,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=971490.0, ans=0.125 2024-08-11 06:55:28,604 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10200, loss[loss=0.112, beats_loss=0.01196, ecapa_loss=0.0001895, whisper_loss=0.09814, over 20089.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01138, ecapa_loss=0.0002112, whisper_loss=0.09419, over 3878925.51 frames. ], batch size: 79, lr: 8.86e-03, grad_scale: 9007199254740992.0 2024-08-11 06:55:28,817 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 06:55:29,920 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 06:55:50,830 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.86 vs. limit=10.0 2024-08-11 06:55:52,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=971690.0, ans=0.0 2024-08-11 06:56:08,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=971890.0, ans=0.1 2024-08-11 06:56:13,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=971890.0, ans=0.0 2024-08-11 06:56:16,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=971890.0, ans=0.1 2024-08-11 06:56:22,076 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.021e+01 2.593e+01 3.063e+01 3.580e+01 1.842e+02, threshold=6.125e+01, percent-clipped=1.0 2024-08-11 06:56:23,477 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-11 06:56:29,983 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 06:56:33,717 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10250, loss[loss=0.1235, beats_loss=0.01132, ecapa_loss=0.0001787, whisper_loss=0.1104, over 21992.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01141, ecapa_loss=0.00021, whisper_loss=0.09375, over 3874131.32 frames. ], batch size: 86, lr: 8.86e-03, grad_scale: 9007199254740992.0 2024-08-11 06:56:47,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=972190.0, ans=0.2 2024-08-11 06:56:59,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=972290.0, ans=0.0 2024-08-11 06:57:05,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=972290.0, ans=0.035 2024-08-11 06:57:14,083 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 06:57:35,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=972490.0, ans=0.0 2024-08-11 06:57:38,646 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10300, loss[loss=0.121, beats_loss=0.008659, ecapa_loss=0.000192, whisper_loss=0.1104, over 20350.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01144, ecapa_loss=0.0002086, whisper_loss=0.09314, over 3881151.36 frames. ], batch size: 76, lr: 8.86e-03, grad_scale: 9007199254740992.0 2024-08-11 06:57:48,937 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-11 06:57:53,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=972690.0, ans=0.125 2024-08-11 06:57:54,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=972690.0, ans=0.125 2024-08-11 06:57:56,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=972690.0, ans=0.125 2024-08-11 06:57:58,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=972690.0, ans=0.09899494936611666 2024-08-11 06:58:09,848 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.424e+02 2024-08-11 06:58:26,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=972890.0, ans=0.025 2024-08-11 06:58:26,504 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 06:58:27,640 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 06:58:28,923 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-11 06:58:31,179 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.762e+01 3.121e+01 3.725e+01 5.735e+01, threshold=6.242e+01, percent-clipped=0.0 2024-08-11 06:58:36,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=972990.0, ans=0.025 2024-08-11 06:58:42,322 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.89 vs. limit=22.5 2024-08-11 06:58:42,808 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10350, loss[loss=0.1039, beats_loss=0.009955, ecapa_loss=0.0002323, whisper_loss=0.09166, over 17921.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01149, ecapa_loss=0.0002076, whisper_loss=0.0928, over 3902375.18 frames. ], batch size: 68, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 06:59:03,912 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 06:59:10,521 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 06:59:15,427 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 06:59:24,491 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 06:59:39,634 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.58 vs. limit=15.0 2024-08-11 06:59:48,060 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10400, loss[loss=0.1118, beats_loss=0.01066, ecapa_loss=0.0002034, whisper_loss=0.09907, over 22415.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01147, ecapa_loss=0.0002075, whisper_loss=0.09298, over 3909379.28 frames. ], batch size: 92, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 06:59:52,672 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.42 vs. limit=22.5 2024-08-11 06:59:59,113 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-08-11 07:00:13,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=973790.0, ans=0.125 2024-08-11 07:00:23,744 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-11 07:00:42,114 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.078e+01 2.630e+01 2.925e+01 3.255e+01 4.896e+01, threshold=5.851e+01, percent-clipped=0.0 2024-08-11 07:00:45,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=973990.0, ans=0.125 2024-08-11 07:00:53,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10450, loss[loss=0.104, beats_loss=0.01231, ecapa_loss=0.000162, whisper_loss=0.09004, over 23482.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01148, ecapa_loss=0.0002076, whisper_loss=0.09236, over 3905394.94 frames. ], batch size: 94, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 07:01:11,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=974190.0, ans=0.0 2024-08-11 07:01:30,449 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 07:01:57,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=974490.0, ans=0.125 2024-08-11 07:02:02,220 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10500, loss[loss=0.07349, beats_loss=0.01344, ecapa_loss=0.0002034, whisper_loss=0.05801, over 13318.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.0114, ecapa_loss=0.0002081, whisper_loss=0.09327, over 3859905.58 frames. ], batch size: 55, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 07:02:37,876 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 19 from LS+wenet, 34 from Vox, 32 fro AS 2024-08-11 07:02:46,043 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 20 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-11 07:02:47,510 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 07:02:52,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=974890.0, ans=0.0 2024-08-11 07:02:57,722 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.245e+01 2.661e+01 2.970e+01 3.368e+01 5.123e+01, threshold=5.939e+01, percent-clipped=0.0 2024-08-11 07:03:10,163 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10550, loss[loss=0.1192, beats_loss=0.01239, ecapa_loss=0.0001917, whisper_loss=0.1049, over 23721.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01142, ecapa_loss=0.0002094, whisper_loss=0.09279, over 3827008.64 frames. ], batch size: 93, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 07:03:11,454 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-11 07:03:13,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=975090.0, ans=0.125 2024-08-11 07:03:20,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=975090.0, ans=0.125 2024-08-11 07:03:35,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=975290.0, ans=0.1 2024-08-11 07:03:58,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=975390.0, ans=0.125 2024-08-11 07:03:58,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=975390.0, ans=0.125 2024-08-11 07:04:07,302 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 07:04:10,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=975490.0, ans=0.0 2024-08-11 07:04:13,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=975490.0, ans=10.0 2024-08-11 07:04:18,159 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10600, loss[loss=0.08805, beats_loss=0.01142, ecapa_loss=0.0001938, whisper_loss=0.07469, over 20831.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01135, ecapa_loss=0.0002103, whisper_loss=0.09316, over 3838154.04 frames. ], batch size: 85, lr: 8.84e-03, grad_scale: 9007199254740992.0 2024-08-11 07:04:23,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=975590.0, ans=0.1 2024-08-11 07:04:34,475 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 07:04:42,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=975690.0, ans=0.1 2024-08-11 07:04:51,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=975790.0, ans=0.1 2024-08-11 07:05:11,775 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+01 2.786e+01 3.038e+01 3.518e+01 8.413e+01, threshold=6.076e+01, percent-clipped=1.0 2024-08-11 07:05:14,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=975990.0, ans=10.0 2024-08-11 07:05:21,303 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 07:05:23,773 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10650, loss[loss=0.1152, beats_loss=0.01022, ecapa_loss=0.0002006, whisper_loss=0.103, over 23472.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01127, ecapa_loss=0.0002086, whisper_loss=0.09353, over 3850788.06 frames. ], batch size: 94, lr: 8.84e-03, grad_scale: 9007199254740992.0 2024-08-11 07:05:42,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=976190.0, ans=0.0 2024-08-11 07:05:51,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=976290.0, ans=0.125 2024-08-11 07:05:54,533 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 07:06:09,287 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=15.0 2024-08-11 07:06:15,593 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 07:06:30,006 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10700, loss[loss=0.08801, beats_loss=0.01039, ecapa_loss=0.0002715, whisper_loss=0.0749, over 16410.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01134, ecapa_loss=0.0002072, whisper_loss=0.09364, over 3858487.30 frames. ], batch size: 69, lr: 8.84e-03, grad_scale: 9007199254740992.0 2024-08-11 07:06:37,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=976590.0, ans=0.125 2024-08-11 07:06:50,951 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.42 vs. limit=10.0 2024-08-11 07:06:55,718 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-11 07:07:09,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=976890.0, ans=0.125 2024-08-11 07:07:12,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=976890.0, ans=0.0 2024-08-11 07:07:14,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=976890.0, ans=0.1 2024-08-11 07:07:15,830 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.28 vs. limit=22.5 2024-08-11 07:07:24,190 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.712e+01 3.090e+01 3.800e+01 9.134e+01, threshold=6.180e+01, percent-clipped=2.0 2024-08-11 07:07:36,223 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10750, loss[loss=0.1038, beats_loss=0.0127, ecapa_loss=0.0002036, whisper_loss=0.08906, over 19940.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01137, ecapa_loss=0.000209, whisper_loss=0.09363, over 3844495.83 frames. ], batch size: 79, lr: 8.84e-03, grad_scale: 9007199254740992.0 2024-08-11 07:07:38,190 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 07:07:59,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=977190.0, ans=0.0 2024-08-11 07:08:03,459 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 07:08:07,047 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 07:08:12,217 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 07:08:14,785 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 07:08:22,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.33 vs. limit=22.5 2024-08-11 07:08:32,820 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-11 07:08:43,342 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10800, loss[loss=0.09489, beats_loss=0.01281, ecapa_loss=0.0002192, whisper_loss=0.07989, over 21150.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01135, ecapa_loss=0.0002103, whisper_loss=0.09367, over 3851284.66 frames. ], batch size: 91, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:08:44,920 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 07:08:46,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=977590.0, ans=0.125 2024-08-11 07:09:01,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=977690.0, ans=0.1 2024-08-11 07:09:13,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=977790.0, ans=0.125 2024-08-11 07:09:14,917 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 07:09:33,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.60 vs. limit=10.0 2024-08-11 07:09:36,067 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.33 vs. limit=6.0 2024-08-11 07:09:36,179 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-11 07:09:39,122 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.607e+01 2.912e+01 3.510e+01 6.638e+01, threshold=5.825e+01, percent-clipped=1.0 2024-08-11 07:09:42,320 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 07:09:51,651 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10850, loss[loss=0.1254, beats_loss=0.01024, ecapa_loss=0.0002199, whisper_loss=0.113, over 18611.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01132, ecapa_loss=0.0002107, whisper_loss=0.09317, over 3847535.39 frames. ], batch size: 77, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:09:56,130 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 07:09:57,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=978090.0, ans=0.04949747468305833 2024-08-11 07:09:58,829 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 28 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 07:10:00,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=978090.0, ans=0.125 2024-08-11 07:10:08,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=978190.0, ans=0.1 2024-08-11 07:10:17,320 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.34 vs. limit=15.0 2024-08-11 07:10:25,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=978290.0, ans=0.2 2024-08-11 07:10:32,788 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 07:10:49,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=978490.0, ans=0.125 2024-08-11 07:10:58,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=978590.0, ans=0.125 2024-08-11 07:10:59,774 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10900, loss[loss=0.08849, beats_loss=0.01398, ecapa_loss=0.0001908, whisper_loss=0.0726, over 19243.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01133, ecapa_loss=0.0002094, whisper_loss=0.09329, over 3850708.01 frames. ], batch size: 81, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:11:04,189 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 32 from Vox, 29 fro AS 2024-08-11 07:11:04,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=978590.0, ans=0.035 2024-08-11 07:11:10,662 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 07:11:19,005 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 07:11:19,567 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-11 07:11:27,527 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.614e-02 2024-08-11 07:11:34,913 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.91 vs. limit=15.0 2024-08-11 07:11:52,610 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 07:11:55,398 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.834e+01 3.154e+01 3.675e+01 5.808e+01, threshold=6.308e+01, percent-clipped=0.0 2024-08-11 07:12:07,474 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 10950, loss[loss=0.09998, beats_loss=0.01024, ecapa_loss=0.0001784, whisper_loss=0.08795, over 20332.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01142, ecapa_loss=0.0002088, whisper_loss=0.09257, over 3840545.79 frames. ], batch size: 78, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:12:21,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=979190.0, ans=0.1 2024-08-11 07:12:37,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=979290.0, ans=0.125 2024-08-11 07:12:37,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=979290.0, ans=0.125 2024-08-11 07:12:46,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=979390.0, ans=0.95 2024-08-11 07:12:48,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=979390.0, ans=0.125 2024-08-11 07:12:57,091 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-11 07:13:01,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=979490.0, ans=0.2 2024-08-11 07:13:13,877 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11000, loss[loss=0.09277, beats_loss=0.01297, ecapa_loss=0.0002009, whisper_loss=0.07779, over 21568.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01141, ecapa_loss=0.0002088, whisper_loss=0.093, over 3838841.93 frames. ], batch size: 91, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:13:15,632 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 07:13:17,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=979590.0, ans=0.125 2024-08-11 07:13:17,324 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2024-08-11 07:13:21,628 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.34 vs. limit=15.0 2024-08-11 07:13:29,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=979690.0, ans=0.1 2024-08-11 07:13:40,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=979790.0, ans=0.0 2024-08-11 07:14:00,997 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-11 07:14:08,871 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.630e+01 2.984e+01 3.392e+01 5.712e+01, threshold=5.968e+01, percent-clipped=0.0 2024-08-11 07:14:09,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=979990.0, ans=0.1 2024-08-11 07:14:11,803 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 07:14:13,136 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 07:14:14,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=979990.0, ans=0.0 2024-08-11 07:14:20,891 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11050, loss[loss=0.1074, beats_loss=0.009586, ecapa_loss=0.0002252, whisper_loss=0.09561, over 16030.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01137, ecapa_loss=0.0002091, whisper_loss=0.09312, over 3852869.27 frames. ], batch size: 64, lr: 8.82e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:14:26,463 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 23 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-11 07:14:32,722 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-11 07:14:34,010 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 07:14:43,620 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 07:14:47,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=980290.0, ans=0.125 2024-08-11 07:14:48,947 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-11 07:14:53,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=980290.0, ans=0.125 2024-08-11 07:14:56,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=980290.0, ans=0.1 2024-08-11 07:14:57,159 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 07:15:13,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=980490.0, ans=0.2 2024-08-11 07:15:20,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=980490.0, ans=0.125 2024-08-11 07:15:28,050 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11100, loss[loss=0.08067, beats_loss=0.01341, ecapa_loss=0.0001866, whisper_loss=0.06539, over 18647.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01131, ecapa_loss=0.0002093, whisper_loss=0.09351, over 3846120.27 frames. ], batch size: 73, lr: 8.82e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:15:36,257 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 07:15:38,081 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 07:15:40,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=980690.0, ans=0.5 2024-08-11 07:15:44,134 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 07:15:44,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=980690.0, ans=0.125 2024-08-11 07:15:51,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=980690.0, ans=0.0 2024-08-11 07:15:53,016 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2024-08-11 07:15:55,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=980790.0, ans=0.125 2024-08-11 07:15:55,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=980790.0, ans=0.1 2024-08-11 07:16:20,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=980890.0, ans=0.125 2024-08-11 07:16:22,656 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 07:16:22,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=980990.0, ans=0.1 2024-08-11 07:16:23,720 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.195e+01 2.722e+01 3.049e+01 3.591e+01 6.029e+01, threshold=6.098e+01, percent-clipped=1.0 2024-08-11 07:16:36,256 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11150, loss[loss=0.1049, beats_loss=0.01233, ecapa_loss=0.0001854, whisper_loss=0.09069, over 22511.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01138, ecapa_loss=0.0002077, whisper_loss=0.09301, over 3848178.33 frames. ], batch size: 89, lr: 8.82e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:16:46,279 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 07:16:49,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=981190.0, ans=0.125 2024-08-11 07:16:57,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=981190.0, ans=0.1 2024-08-11 07:16:59,602 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-11 07:17:07,045 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.77 vs. limit=15.0 2024-08-11 07:17:14,828 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 07:17:21,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=981390.0, ans=0.0 2024-08-11 07:17:43,857 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11200, loss[loss=0.1046, beats_loss=0.0136, ecapa_loss=0.0001872, whisper_loss=0.08916, over 22596.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01129, ecapa_loss=0.0002095, whisper_loss=0.09341, over 3859478.41 frames. ], batch size: 94, lr: 8.82e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:17:50,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=981590.0, ans=0.125 2024-08-11 07:18:00,717 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=12.0 2024-08-11 07:18:01,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=981690.0, ans=0.125 2024-08-11 07:18:13,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=981790.0, ans=0.2 2024-08-11 07:18:33,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=981890.0, ans=0.125 2024-08-11 07:18:34,328 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 28 from Vox, 22 fro AS 2024-08-11 07:18:37,271 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.25 vs. limit=22.5 2024-08-11 07:18:39,133 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+01 2.676e+01 2.993e+01 3.397e+01 5.977e+01, threshold=5.986e+01, percent-clipped=0.0 2024-08-11 07:18:46,970 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2024-08-11 07:18:51,708 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11250, loss[loss=0.08385, beats_loss=0.01281, ecapa_loss=0.000188, whisper_loss=0.06916, over 13358.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01131, ecapa_loss=0.0002095, whisper_loss=0.09295, over 3845471.68 frames. ], batch size: 54, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:18:51,937 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 07:18:58,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=982090.0, ans=0.125 2024-08-11 07:19:01,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=982090.0, ans=0.1 2024-08-11 07:19:01,510 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.051e-03 2024-08-11 07:19:21,456 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 30 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 07:19:21,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=982290.0, ans=0.1 2024-08-11 07:19:36,931 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 07:19:39,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=982390.0, ans=0.125 2024-08-11 07:19:52,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=982490.0, ans=0.125 2024-08-11 07:19:58,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=982590.0, ans=0.015 2024-08-11 07:19:59,735 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11300, loss[loss=0.1068, beats_loss=0.01156, ecapa_loss=0.0001657, whisper_loss=0.09358, over 17740.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.0113, ecapa_loss=0.0002091, whisper_loss=0.09333, over 3875254.40 frames. ], batch size: 71, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:20:07,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=982590.0, ans=0.1 2024-08-11 07:20:21,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=982690.0, ans=0.0 2024-08-11 07:20:28,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=982790.0, ans=0.125 2024-08-11 07:20:53,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.105e+01 2.719e+01 3.008e+01 3.388e+01 1.679e+02, threshold=6.016e+01, percent-clipped=1.0 2024-08-11 07:20:54,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=982990.0, ans=0.2 2024-08-11 07:21:05,477 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11350, loss[loss=0.09716, beats_loss=0.013, ecapa_loss=0.0002039, whisper_loss=0.08212, over 22866.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01132, ecapa_loss=0.0002087, whisper_loss=0.09307, over 3865737.46 frames. ], batch size: 93, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:21:10,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=983090.0, ans=0.05 2024-08-11 07:21:14,501 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 16 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 07:21:18,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=983190.0, ans=0.125 2024-08-11 07:21:19,792 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 07:21:20,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=983190.0, ans=0.0 2024-08-11 07:21:21,556 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.06 vs. limit=22.5 2024-08-11 07:21:24,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=983190.0, ans=0.0 2024-08-11 07:21:26,928 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2024-08-11 07:21:42,087 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.04 vs. limit=10.0 2024-08-11 07:22:10,218 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11400, loss[loss=0.1122, beats_loss=0.009404, ecapa_loss=0.0002151, whisper_loss=0.1007, over 20343.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01139, ecapa_loss=0.0002065, whisper_loss=0.09318, over 3884983.61 frames. ], batch size: 78, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:22:26,235 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.922e-02 2024-08-11 07:22:36,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=983790.0, ans=0.0 2024-08-11 07:22:37,235 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 07:22:44,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=983790.0, ans=0.0 2024-08-11 07:22:54,766 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-11 07:23:02,242 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.896e+01 3.252e+01 3.905e+01 6.465e+01, threshold=6.504e+01, percent-clipped=1.0 2024-08-11 07:23:03,707 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 07:23:06,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=983990.0, ans=0.0 2024-08-11 07:23:13,920 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11450, loss[loss=0.1026, beats_loss=0.01382, ecapa_loss=0.0001885, whisper_loss=0.08687, over 21738.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01141, ecapa_loss=0.0002085, whisper_loss=0.09342, over 3896470.69 frames. ], batch size: 93, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:23:21,316 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.55 vs. limit=15.0 2024-08-11 07:23:31,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=984190.0, ans=0.0 2024-08-11 07:23:36,075 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 16 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 07:23:45,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=984290.0, ans=0.125 2024-08-11 07:24:23,625 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11500, loss[loss=0.104, beats_loss=0.009147, ecapa_loss=0.0002417, whisper_loss=0.0924, over 21580.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01131, ecapa_loss=0.0002085, whisper_loss=0.09399, over 3885441.46 frames. ], batch size: 88, lr: 8.80e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:24:45,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=984690.0, ans=0.2 2024-08-11 07:25:27,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=984890.0, ans=0.125 2024-08-11 07:25:45,087 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.765e+01 3.010e+01 3.592e+01 5.034e+01, threshold=6.021e+01, percent-clipped=0.0 2024-08-11 07:26:02,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=985090.0, ans=0.05 2024-08-11 07:26:03,223 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11550, loss[loss=0.102, beats_loss=0.01223, ecapa_loss=0.000212, whisper_loss=0.08764, over 20623.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01131, ecapa_loss=0.0002077, whisper_loss=0.0943, over 3883555.12 frames. ], batch size: 86, lr: 8.80e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:26:13,297 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=22.5 2024-08-11 07:26:21,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=985190.0, ans=0.125 2024-08-11 07:27:41,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=985490.0, ans=0.125 2024-08-11 07:27:52,392 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11600, loss[loss=0.1205, beats_loss=0.01012, ecapa_loss=0.0001895, whisper_loss=0.1085, over 17094.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.0113, ecapa_loss=0.0002063, whisper_loss=0.09443, over 3883744.35 frames. ], batch size: 63, lr: 8.80e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:27:56,177 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 07:28:19,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=985690.0, ans=0.0 2024-08-11 07:28:54,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=985790.0, ans=0.1 2024-08-11 07:28:59,668 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 07:29:23,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=15.0 2024-08-11 07:29:29,864 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.064e+01 2.587e+01 2.898e+01 3.413e+01 5.144e+01, threshold=5.796e+01, percent-clipped=0.0 2024-08-11 07:29:30,000 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-11 07:29:36,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=985990.0, ans=0.0 2024-08-11 07:29:43,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=986090.0, ans=0.0 2024-08-11 07:29:44,351 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11650, loss[loss=0.09702, beats_loss=0.01264, ecapa_loss=0.0001818, whisper_loss=0.08257, over 21364.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01132, ecapa_loss=0.0002054, whisper_loss=0.09447, over 3888284.99 frames. ], batch size: 85, lr: 8.80e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:29:55,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=986090.0, ans=0.2 2024-08-11 07:30:11,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=986190.0, ans=0.0 2024-08-11 07:30:38,511 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.81 vs. limit=15.0 2024-08-11 07:30:45,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=986390.0, ans=0.2 2024-08-11 07:30:51,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=986390.0, ans=0.09899494936611666 2024-08-11 07:30:51,649 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.75 vs. limit=6.0 2024-08-11 07:30:52,667 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 07:31:00,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=986490.0, ans=0.1 2024-08-11 07:31:09,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=986490.0, ans=0.125 2024-08-11 07:31:13,053 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11700, loss[loss=0.1053, beats_loss=0.01329, ecapa_loss=0.000237, whisper_loss=0.08967, over 21705.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01142, ecapa_loss=0.0002047, whisper_loss=0.09471, over 3898784.69 frames. ], batch size: 91, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:31:23,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=986590.0, ans=0.125 2024-08-11 07:31:37,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=986690.0, ans=0.2 2024-08-11 07:31:44,340 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-11 07:31:44,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=986690.0, ans=0.0 2024-08-11 07:32:23,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=986990.0, ans=10.0 2024-08-11 07:32:24,504 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 2.842e+01 3.149e+01 3.845e+01 7.778e+01, threshold=6.297e+01, percent-clipped=3.0 2024-08-11 07:32:24,706 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-11 07:32:39,080 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11750, loss[loss=0.1113, beats_loss=0.01233, ecapa_loss=0.0001905, whisper_loss=0.0971, over 21453.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01147, ecapa_loss=0.0002046, whisper_loss=0.0947, over 3916442.04 frames. ], batch size: 86, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:33:14,279 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 38 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 07:33:27,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=987290.0, ans=0.1 2024-08-11 07:34:00,699 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2024-08-11 07:34:03,641 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 07:34:09,384 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11800, loss[loss=0.1104, beats_loss=0.01299, ecapa_loss=0.0002274, whisper_loss=0.09514, over 21245.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01145, ecapa_loss=0.0002048, whisper_loss=0.09494, over 3923955.43 frames. ], batch size: 91, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:34:25,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=987690.0, ans=0.1 2024-08-11 07:34:44,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=987790.0, ans=0.2 2024-08-11 07:35:18,948 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.720e+01 3.073e+01 3.423e+01 3.198e+02, threshold=6.145e+01, percent-clipped=1.0 2024-08-11 07:35:36,256 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11850, loss[loss=0.1049, beats_loss=0.01023, ecapa_loss=0.0002261, whisper_loss=0.09242, over 18267.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01145, ecapa_loss=0.0002045, whisper_loss=0.09455, over 3936050.47 frames. ], batch size: 72, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:36:06,819 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 07:36:15,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=988290.0, ans=0.125 2024-08-11 07:36:27,908 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-11 07:36:28,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=988390.0, ans=0.125 2024-08-11 07:36:30,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=988390.0, ans=0.2 2024-08-11 07:36:48,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=988490.0, ans=0.07 2024-08-11 07:36:50,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=988490.0, ans=0.125 2024-08-11 07:36:57,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=988490.0, ans=0.0 2024-08-11 07:37:01,828 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11900, loss[loss=0.1201, beats_loss=0.01193, ecapa_loss=0.0001846, whisper_loss=0.1064, over 23329.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01141, ecapa_loss=0.0002054, whisper_loss=0.0947, over 3927580.31 frames. ], batch size: 91, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:38:03,944 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 07:38:04,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=988990.0, ans=0.125 2024-08-11 07:38:06,406 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.737e+01 3.168e+01 3.571e+01 8.955e+01, threshold=6.335e+01, percent-clipped=2.0 2024-08-11 07:38:06,885 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 07:38:07,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=988990.0, ans=0.125 2024-08-11 07:38:18,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=988990.0, ans=0.0 2024-08-11 07:38:20,404 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 11950, loss[loss=0.08448, beats_loss=0.01085, ecapa_loss=0.0002387, whisper_loss=0.07125, over 17250.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01134, ecapa_loss=0.0002065, whisper_loss=0.09396, over 3874957.04 frames. ], batch size: 70, lr: 8.78e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:38:20,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=989090.0, ans=0.125 2024-08-11 07:38:33,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=989090.0, ans=0.125 2024-08-11 07:38:33,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=989090.0, ans=0.1 2024-08-11 07:38:38,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=989190.0, ans=0.1 2024-08-11 07:38:47,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=989190.0, ans=0.0 2024-08-11 07:38:57,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=989290.0, ans=0.125 2024-08-11 07:39:02,208 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 07:39:04,522 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.061e+01 2024-08-11 07:39:07,348 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 19 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-11 07:39:19,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=989390.0, ans=0.0 2024-08-11 07:39:19,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=989390.0, ans=0.2 2024-08-11 07:39:34,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=989490.0, ans=0.125 2024-08-11 07:39:37,880 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12000, loss[loss=0.1209, beats_loss=0.01353, ecapa_loss=0.0001964, whisper_loss=0.1054, over 22348.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.0113, ecapa_loss=0.0002072, whisper_loss=0.09424, over 3882908.42 frames. ], batch size: 86, lr: 8.78e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:39:37,880 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 07:40:13,095 INFO [train_multi_KD3.py:1149] (2/4) Epoch 7, validation on ASR_libri: loss=0.2587, beats_loss=0, ecapa_loss=0.0006674, whisper_loss=0.252, over 922467.00 frames. 2024-08-11 07:40:32,499 INFO [train_multi_KD3.py:1149] (2/4) Epoch 7, validation on SV_voxceleb1: loss=0.005495, beats_loss=0, ecapa_loss=0.0005495, whisper_loss=0, over 939242.00 frames. 2024-08-11 07:42:18,262 INFO [train_multi_KD3.py:1149] (2/4) Epoch 7, validation on AT_audioset: loss=0.02554, beats_loss=0.02554, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 07:42:18,266 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 07:42:18,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=989590.0, ans=0.125 2024-08-11 07:42:53,525 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-11 07:43:00,926 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 07:43:06,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=989890.0, ans=0.125 2024-08-11 07:43:18,367 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2024-08-11 07:43:21,389 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.747e+01 3.219e+01 3.881e+01 9.695e+01, threshold=6.438e+01, percent-clipped=1.0 2024-08-11 07:43:23,542 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 33 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 07:43:25,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=989990.0, ans=0.125 2024-08-11 07:43:35,540 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12050, loss[loss=0.09743, beats_loss=0.01167, ecapa_loss=0.0001953, whisper_loss=0.0838, over 21228.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01131, ecapa_loss=0.0002064, whisper_loss=0.09419, over 3881260.29 frames. ], batch size: 84, lr: 8.78e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:43:38,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=990090.0, ans=0.125 2024-08-11 07:43:46,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=990090.0, ans=0.0 2024-08-11 07:43:47,540 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 07:43:49,150 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 07:44:02,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=990190.0, ans=0.125 2024-08-11 07:44:11,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=990290.0, ans=0.1 2024-08-11 07:44:18,463 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 23 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 07:44:24,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=990390.0, ans=0.0 2024-08-11 07:44:44,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=990490.0, ans=0.04949747468305833 2024-08-11 07:44:50,570 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12100, loss[loss=0.09933, beats_loss=0.01351, ecapa_loss=0.0002267, whisper_loss=0.08356, over 19377.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01138, ecapa_loss=0.0002064, whisper_loss=0.09359, over 3873928.57 frames. ], batch size: 82, lr: 8.78e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:44:55,425 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2024-08-11 07:45:11,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.91 vs. limit=6.0 2024-08-11 07:45:27,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=990790.0, ans=0.125 2024-08-11 07:45:42,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=990890.0, ans=0.125 2024-08-11 07:45:43,105 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 07:45:46,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=990890.0, ans=0.125 2024-08-11 07:45:54,928 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.799e+01 3.089e+01 3.650e+01 5.391e+01, threshold=6.177e+01, percent-clipped=0.0 2024-08-11 07:46:02,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=990990.0, ans=0.125 2024-08-11 07:46:03,016 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=12.0 2024-08-11 07:46:10,312 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12150, loss[loss=0.1193, beats_loss=0.01047, ecapa_loss=0.0002069, whisper_loss=0.1068, over 21572.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01133, ecapa_loss=0.0002057, whisper_loss=0.09415, over 3865216.69 frames. ], batch size: 84, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:46:15,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=991090.0, ans=0.125 2024-08-11 07:46:23,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=991090.0, ans=0.125 2024-08-11 07:46:30,704 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.16 vs. limit=22.5 2024-08-11 07:46:43,878 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2024-08-11 07:46:48,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=991290.0, ans=0.125 2024-08-11 07:46:52,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=991290.0, ans=0.0 2024-08-11 07:46:55,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=991290.0, ans=0.125 2024-08-11 07:47:17,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=991490.0, ans=0.0 2024-08-11 07:47:18,832 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 07:47:30,468 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12200, loss[loss=0.1127, beats_loss=0.01018, ecapa_loss=0.0002167, whisper_loss=0.1003, over 21323.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01138, ecapa_loss=0.0002042, whisper_loss=0.09421, over 3877542.41 frames. ], batch size: 83, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:47:32,626 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 07:47:37,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=991590.0, ans=0.1 2024-08-11 07:47:59,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=991690.0, ans=0.125 2024-08-11 07:48:27,244 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 23 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-11 07:48:35,690 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.629e+01 2.882e+01 3.326e+01 5.595e+01, threshold=5.765e+01, percent-clipped=0.0 2024-08-11 07:48:49,399 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12250, loss[loss=0.09365, beats_loss=0.01093, ecapa_loss=0.0002077, whisper_loss=0.08064, over 18620.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01132, ecapa_loss=0.000204, whisper_loss=0.09466, over 3872172.20 frames. ], batch size: 74, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:48:59,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=992090.0, ans=0.04949747468305833 2024-08-11 07:49:09,197 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 07:49:09,777 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=12.0 2024-08-11 07:49:13,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=992190.0, ans=0.1 2024-08-11 07:49:22,845 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-11 07:49:38,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=992390.0, ans=0.0 2024-08-11 07:50:01,478 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.53 vs. limit=15.0 2024-08-11 07:50:05,990 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 07:50:07,670 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-11 07:50:08,671 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12300, loss[loss=0.1046, beats_loss=0.01113, ecapa_loss=0.0001772, whisper_loss=0.09167, over 14725.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.0113, ecapa_loss=0.0002053, whisper_loss=0.09482, over 3877735.50 frames. ], batch size: 54, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:50:13,810 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 07:50:15,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=992590.0, ans=0.0 2024-08-11 07:50:44,605 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-11 07:50:53,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=992790.0, ans=0.05 2024-08-11 07:51:00,719 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-11 07:51:01,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=992890.0, ans=0.125 2024-08-11 07:51:03,950 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2024-08-11 07:51:04,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=992890.0, ans=0.0 2024-08-11 07:51:12,577 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.794e+01 3.118e+01 3.585e+01 7.136e+01, threshold=6.237e+01, percent-clipped=2.0 2024-08-11 07:51:14,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=992990.0, ans=0.035 2024-08-11 07:51:27,284 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12350, loss[loss=0.09201, beats_loss=0.01012, ecapa_loss=0.0002504, whisper_loss=0.07939, over 15799.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01132, ecapa_loss=0.0002071, whisper_loss=0.09386, over 3828433.87 frames. ], batch size: 67, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:51:29,771 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=12.0 2024-08-11 07:51:37,919 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 07:51:44,323 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-11 07:51:49,968 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 07:51:50,577 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.87 vs. limit=15.0 2024-08-11 07:51:58,671 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-11 07:52:19,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=993390.0, ans=0.125 2024-08-11 07:52:21,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=993390.0, ans=0.0 2024-08-11 07:52:24,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=993390.0, ans=0.1 2024-08-11 07:52:41,677 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12400, loss[loss=0.07276, beats_loss=0.01204, ecapa_loss=0.0001789, whisper_loss=0.05893, over 16837.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01139, ecapa_loss=0.0002068, whisper_loss=0.09351, over 3851769.60 frames. ], batch size: 66, lr: 8.76e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:52:45,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=993590.0, ans=0.2 2024-08-11 07:52:47,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=993590.0, ans=0.125 2024-08-11 07:53:12,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=993790.0, ans=0.2 2024-08-11 07:53:23,113 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 07:53:24,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=993790.0, ans=0.125 2024-08-11 07:53:28,885 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 22 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-11 07:53:34,457 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 07:53:47,312 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+01 2.944e+01 3.370e+01 3.888e+01 6.179e+01, threshold=6.739e+01, percent-clipped=0.0 2024-08-11 07:54:00,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=994090.0, ans=0.125 2024-08-11 07:54:01,520 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12450, loss[loss=0.1248, beats_loss=0.009037, ecapa_loss=0.0002545, whisper_loss=0.1132, over 21713.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01129, ecapa_loss=0.0002056, whisper_loss=0.09421, over 3857447.47 frames. ], batch size: 89, lr: 8.76e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:54:25,395 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 07:54:45,361 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 07:55:02,337 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 07:55:02,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=994490.0, ans=0.025 2024-08-11 07:55:15,406 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 07:55:15,984 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.20 vs. limit=15.0 2024-08-11 07:55:17,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=994590.0, ans=0.0 2024-08-11 07:55:19,312 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12500, loss[loss=0.1129, beats_loss=0.01092, ecapa_loss=0.000258, whisper_loss=0.09936, over 22166.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01139, ecapa_loss=0.0002042, whisper_loss=0.09436, over 3880671.35 frames. ], batch size: 93, lr: 8.76e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:55:21,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=994590.0, ans=0.125 2024-08-11 07:55:30,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=994590.0, ans=0.0 2024-08-11 07:55:30,482 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2024-08-11 07:55:55,724 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.39 vs. limit=10.0 2024-08-11 07:56:13,940 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 07:56:23,235 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.789e+01 3.126e+01 3.797e+01 5.980e+01, threshold=6.252e+01, percent-clipped=0.0 2024-08-11 07:56:37,056 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12550, loss[loss=0.1075, beats_loss=0.01135, ecapa_loss=0.0001923, whisper_loss=0.09421, over 22197.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01129, ecapa_loss=0.0002045, whisper_loss=0.09476, over 3879442.01 frames. ], batch size: 90, lr: 8.76e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:56:58,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=995190.0, ans=0.0 2024-08-11 07:57:03,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=995190.0, ans=0.125 2024-08-11 07:57:07,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=995290.0, ans=0.125 2024-08-11 07:57:19,869 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2024-08-11 07:57:28,629 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 07:57:46,973 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=22.5 2024-08-11 07:57:56,128 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12600, loss[loss=0.1173, beats_loss=0.009949, ecapa_loss=0.0002751, whisper_loss=0.1046, over 15390.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01141, ecapa_loss=0.000205, whisper_loss=0.09463, over 3884796.34 frames. ], batch size: 64, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:57:58,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=995590.0, ans=0.125 2024-08-11 07:58:05,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=995590.0, ans=0.07 2024-08-11 07:58:08,352 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 07:58:09,805 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 19 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-11 07:58:27,564 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 07:58:29,343 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 07:58:32,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=995790.0, ans=0.125 2024-08-11 07:59:00,671 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.576e+01 3.023e+01 3.555e+01 7.578e+01, threshold=6.047e+01, percent-clipped=3.0 2024-08-11 07:59:00,856 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 07:59:08,116 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 07:59:17,599 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12650, loss[loss=0.112, beats_loss=0.01155, ecapa_loss=0.0001723, whisper_loss=0.09877, over 17494.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01142, ecapa_loss=0.0002056, whisper_loss=0.09401, over 3883601.48 frames. ], batch size: 65, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:59:19,373 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 07:59:20,939 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 07:59:24,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=996090.0, ans=0.125 2024-08-11 07:59:30,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=996090.0, ans=0.1 2024-08-11 07:59:34,180 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2024-08-11 07:59:41,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=996190.0, ans=0.125 2024-08-11 07:59:44,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=996190.0, ans=0.0 2024-08-11 07:59:46,295 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 08:00:16,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=996390.0, ans=0.0 2024-08-11 08:00:22,054 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-11 08:00:26,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=996490.0, ans=0.2 2024-08-11 08:00:42,480 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12700, loss[loss=0.08945, beats_loss=0.01195, ecapa_loss=0.0001572, whisper_loss=0.07593, over 18508.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01141, ecapa_loss=0.0002065, whisper_loss=0.0939, over 3882959.97 frames. ], batch size: 68, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:00:57,518 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 36 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 08:00:57,963 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2024-08-11 08:01:24,386 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-11 08:01:37,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=996890.0, ans=0.0 2024-08-11 08:01:38,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=996890.0, ans=0.125 2024-08-11 08:01:42,363 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2024-08-11 08:01:53,857 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.625e+01 2.937e+01 3.351e+01 6.413e+01, threshold=5.874e+01, percent-clipped=1.0 2024-08-11 08:01:54,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=996990.0, ans=0.125 2024-08-11 08:01:54,548 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=12.0 2024-08-11 08:02:01,765 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 08:02:10,166 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12750, loss[loss=0.1324, beats_loss=0.009632, ecapa_loss=0.0001726, whisper_loss=0.1211, over 21589.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01142, ecapa_loss=0.0002062, whisper_loss=0.09469, over 3906731.12 frames. ], batch size: 78, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:02:15,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=997090.0, ans=0.04949747468305833 2024-08-11 08:02:22,023 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 08:02:28,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=997190.0, ans=0.1 2024-08-11 08:02:40,214 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 08:02:40,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=997190.0, ans=0.125 2024-08-11 08:02:43,946 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 38 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 08:03:10,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=997390.0, ans=0.09899494936611666 2024-08-11 08:03:11,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=997390.0, ans=0.0 2024-08-11 08:03:32,712 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12800, loss[loss=0.1316, beats_loss=0.01003, ecapa_loss=0.0002403, whisper_loss=0.1191, over 14954.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01151, ecapa_loss=0.0002071, whisper_loss=0.09407, over 3939994.38 frames. ], batch size: 63, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:03:52,774 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 32 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 08:03:56,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=997690.0, ans=0.0 2024-08-11 08:03:58,084 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 08:04:10,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=997790.0, ans=0.05 2024-08-11 08:04:20,894 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 08:04:30,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=997890.0, ans=0.0 2024-08-11 08:04:33,020 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2024-08-11 08:04:40,061 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.47 vs. limit=12.0 2024-08-11 08:04:41,369 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-11 08:04:42,778 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.631e+01 3.014e+01 3.452e+01 5.658e+01, threshold=6.028e+01, percent-clipped=0.0 2024-08-11 08:04:56,913 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12850, loss[loss=0.08589, beats_loss=0.01258, ecapa_loss=0.0001989, whisper_loss=0.07132, over 16775.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01161, ecapa_loss=0.0002066, whisper_loss=0.09288, over 3903481.75 frames. ], batch size: 69, lr: 8.74e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:05:03,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=998090.0, ans=0.2 2024-08-11 08:05:17,152 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-11 08:05:17,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=998190.0, ans=0.1 2024-08-11 08:05:28,029 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 08:05:29,398 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 08:05:31,251 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-11 08:05:52,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=998390.0, ans=0.1 2024-08-11 08:05:54,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=998390.0, ans=0.125 2024-08-11 08:06:17,261 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12900, loss[loss=0.1276, beats_loss=0.01014, ecapa_loss=0.0001559, whisper_loss=0.1159, over 24400.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01163, ecapa_loss=0.0002052, whisper_loss=0.09238, over 3880292.75 frames. ], batch size: 92, lr: 8.74e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:06:17,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=998590.0, ans=0.125 2024-08-11 08:06:33,026 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 08:06:39,698 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.393e+05 2024-08-11 08:06:59,486 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 08:07:07,504 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 08:07:24,821 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.613e+01 2.962e+01 3.305e+01 5.857e+01, threshold=5.923e+01, percent-clipped=0.0 2024-08-11 08:07:40,212 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 08:07:42,011 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 12950, loss[loss=0.1076, beats_loss=0.01074, ecapa_loss=0.0002335, whisper_loss=0.09451, over 15184.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01145, ecapa_loss=0.000207, whisper_loss=0.09371, over 3879227.80 frames. ], batch size: 61, lr: 8.74e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:07:47,352 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-11 08:07:50,140 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2024-08-11 08:08:06,949 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 08:08:20,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=999290.0, ans=0.125 2024-08-11 08:08:32,600 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-11 08:08:47,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=999390.0, ans=0.95 2024-08-11 08:08:48,941 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 08:09:00,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=999490.0, ans=0.125 2024-08-11 08:09:01,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=999490.0, ans=0.125 2024-08-11 08:09:11,403 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13000, loss[loss=0.1135, beats_loss=0.008529, ecapa_loss=0.0002661, whisper_loss=0.1023, over 22483.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0114, ecapa_loss=0.0002065, whisper_loss=0.09396, over 3920412.21 frames. ], batch size: 92, lr: 8.74e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:09:11,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=999590.0, ans=0.035 2024-08-11 08:10:25,057 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.079e+01 2.746e+01 3.044e+01 3.535e+01 5.645e+01, threshold=6.088e+01, percent-clipped=0.0 2024-08-11 08:10:33,053 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 08:10:33,192 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.40 vs. limit=15.0 2024-08-11 08:10:34,034 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 35 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-11 08:10:39,312 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13050, loss[loss=0.1117, beats_loss=0.0107, ecapa_loss=0.0001982, whisper_loss=0.09901, over 20254.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.0114, ecapa_loss=0.0002069, whisper_loss=0.09367, over 3933913.43 frames. ], batch size: 80, lr: 8.74e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:10:44,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1000090.0, ans=0.0 2024-08-11 08:10:45,794 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 08:11:13,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1000290.0, ans=0.0 2024-08-11 08:11:15,015 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 08:11:40,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1000490.0, ans=0.2 2024-08-11 08:11:47,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1000490.0, ans=0.0 2024-08-11 08:11:55,737 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13100, loss[loss=0.1233, beats_loss=0.01073, ecapa_loss=0.0001536, whisper_loss=0.111, over 15939.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01135, ecapa_loss=0.000205, whisper_loss=0.09395, over 3926405.71 frames. ], batch size: 59, lr: 8.73e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:12:09,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1000690.0, ans=0.025 2024-08-11 08:12:13,315 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 28 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 08:12:21,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1000690.0, ans=0.04949747468305833 2024-08-11 08:12:43,897 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 08:12:51,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1000890.0, ans=0.1 2024-08-11 08:12:54,709 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.920e+01 3.431e+01 3.898e+01 1.839e+02, threshold=6.862e+01, percent-clipped=3.0 2024-08-11 08:13:01,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1000990.0, ans=0.0 2024-08-11 08:13:02,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1000990.0, ans=0.2 2024-08-11 08:13:07,938 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13150, loss[loss=0.09979, beats_loss=0.01517, ecapa_loss=0.0001642, whisper_loss=0.08298, over 22784.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01135, ecapa_loss=0.0002044, whisper_loss=0.09428, over 3934037.53 frames. ], batch size: 90, lr: 8.73e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:13:09,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1001090.0, ans=0.1 2024-08-11 08:13:34,599 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 08:13:39,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1001290.0, ans=0.125 2024-08-11 08:14:01,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1001390.0, ans=0.0 2024-08-11 08:14:16,816 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 16 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 08:14:20,793 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13200, loss[loss=0.09002, beats_loss=0.0118, ecapa_loss=0.000198, whisper_loss=0.07625, over 13560.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01141, ecapa_loss=0.0002038, whisper_loss=0.09359, over 3907630.91 frames. ], batch size: 57, lr: 8.73e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:14:27,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1001590.0, ans=0.0 2024-08-11 08:14:27,254 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2024-08-11 08:14:28,889 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 08:14:34,990 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-11 08:14:43,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1001690.0, ans=0.0 2024-08-11 08:14:47,559 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-11 08:14:47,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1001690.0, ans=0.0 2024-08-11 08:14:52,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1001790.0, ans=0.125 2024-08-11 08:15:02,005 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2024-08-11 08:15:02,149 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=15.0 2024-08-11 08:15:22,800 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.032e+01 2.762e+01 3.091e+01 3.560e+01 4.785e+01, threshold=6.182e+01, percent-clipped=0.0 2024-08-11 08:15:30,394 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 08:15:36,293 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13250, loss[loss=0.08665, beats_loss=0.01035, ecapa_loss=0.0003269, whisper_loss=0.07303, over 17858.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01134, ecapa_loss=0.0002051, whisper_loss=0.09378, over 3927626.81 frames. ], batch size: 77, lr: 8.73e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:15:43,016 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.12 vs. limit=22.5 2024-08-11 08:15:59,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.03 vs. limit=15.0 2024-08-11 08:16:02,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1002190.0, ans=0.1 2024-08-11 08:16:05,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1002290.0, ans=0.125 2024-08-11 08:16:06,420 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 08:16:30,455 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 08:16:45,819 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-11 08:16:51,609 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13300, loss[loss=0.1188, beats_loss=0.009818, ecapa_loss=0.0002173, whisper_loss=0.1068, over 22445.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01131, ecapa_loss=0.0002052, whisper_loss=0.09437, over 3931828.56 frames. ], batch size: 87, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:17:10,637 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 13 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 08:17:16,798 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-11 08:17:32,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1002790.0, ans=0.0 2024-08-11 08:17:36,372 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 08:17:54,337 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 08:17:55,857 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.657e+01 3.097e+01 3.589e+01 1.012e+02, threshold=6.194e+01, percent-clipped=1.0 2024-08-11 08:18:05,718 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 08:18:10,268 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13350, loss[loss=0.08026, beats_loss=0.01746, ecapa_loss=0.0001908, whisper_loss=0.06089, over 16213.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01131, ecapa_loss=0.0002042, whisper_loss=0.09437, over 3914766.13 frames. ], batch size: 68, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:18:26,094 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 08:18:39,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.94 vs. limit=10.0 2024-08-11 08:18:42,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1003290.0, ans=0.125 2024-08-11 08:18:43,770 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 21 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-11 08:18:43,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1003290.0, ans=0.0 2024-08-11 08:19:10,860 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.10 vs. limit=15.0 2024-08-11 08:19:11,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1003490.0, ans=0.0 2024-08-11 08:19:18,650 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-11 08:19:29,386 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13400, loss[loss=0.08523, beats_loss=0.01285, ecapa_loss=0.0002055, whisper_loss=0.07033, over 18184.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01144, ecapa_loss=0.0002042, whisper_loss=0.09347, over 3867662.42 frames. ], batch size: 78, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:19:54,523 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 08:19:59,329 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 08:20:04,859 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 08:20:05,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1003790.0, ans=0.07 2024-08-11 08:20:07,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1003790.0, ans=0.0 2024-08-11 08:20:09,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1003790.0, ans=0.125 2024-08-11 08:20:09,979 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 08:20:34,181 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.185e+01 2.700e+01 3.139e+01 3.511e+01 8.019e+01, threshold=6.278e+01, percent-clipped=1.0 2024-08-11 08:20:36,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1003990.0, ans=0.125 2024-08-11 08:20:42,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1003990.0, ans=0.035 2024-08-11 08:20:43,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1003990.0, ans=0.0 2024-08-11 08:20:47,947 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13450, loss[loss=0.09571, beats_loss=0.01103, ecapa_loss=0.0002146, whisper_loss=0.08253, over 16815.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01145, ecapa_loss=0.0002047, whisper_loss=0.09336, over 3866246.90 frames. ], batch size: 67, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:20:57,027 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 08:21:14,984 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 08:21:15,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1004190.0, ans=10.0 2024-08-11 08:22:01,455 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 08:22:01,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1004490.0, ans=0.125 2024-08-11 08:22:05,944 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13500, loss[loss=0.1084, beats_loss=0.0129, ecapa_loss=0.0001591, whisper_loss=0.09389, over 23103.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.0114, ecapa_loss=0.0002057, whisper_loss=0.09379, over 3856962.73 frames. ], batch size: 90, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:22:10,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1004590.0, ans=0.025 2024-08-11 08:22:20,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1004690.0, ans=0.125 2024-08-11 08:22:32,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1004690.0, ans=0.125 2024-08-11 08:22:35,543 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.17 vs. limit=10.0 2024-08-11 08:22:35,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=15.0 2024-08-11 08:22:48,924 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 08:22:58,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1004890.0, ans=0.07 2024-08-11 08:23:02,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1004990.0, ans=0.2 2024-08-11 08:23:04,782 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.720e+01 3.065e+01 3.481e+01 5.636e+01, threshold=6.129e+01, percent-clipped=0.0 2024-08-11 08:23:18,558 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13550, loss[loss=0.1028, beats_loss=0.01227, ecapa_loss=0.0002154, whisper_loss=0.08839, over 22211.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01149, ecapa_loss=0.0002047, whisper_loss=0.09322, over 3857771.48 frames. ], batch size: 93, lr: 8.71e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:23:21,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1005090.0, ans=0.125 2024-08-11 08:23:24,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1005090.0, ans=0.0 2024-08-11 08:23:30,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1005090.0, ans=0.125 2024-08-11 08:23:35,948 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 08:23:46,333 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 08:23:54,719 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 08:23:56,103 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-11 08:24:02,050 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 08:24:11,182 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-08-11 08:24:12,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1005390.0, ans=0.0 2024-08-11 08:24:13,847 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-11 08:24:29,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1005490.0, ans=0.125 2024-08-11 08:24:32,052 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13600, loss[loss=0.09588, beats_loss=0.01336, ecapa_loss=0.0002114, whisper_loss=0.0804, over 16205.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01149, ecapa_loss=0.0002036, whisper_loss=0.09341, over 3872097.87 frames. ], batch size: 67, lr: 8.71e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:24:42,639 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 32 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 08:24:53,509 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.70 vs. limit=10.0 2024-08-11 08:24:58,793 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.811e+00 2024-08-11 08:25:02,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1005790.0, ans=0.125 2024-08-11 08:25:05,962 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-11 08:25:12,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1005790.0, ans=0.07 2024-08-11 08:25:13,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1005790.0, ans=0.1 2024-08-11 08:25:20,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=1005890.0, ans=0.1 2024-08-11 08:25:31,901 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.811e+01 3.158e+01 3.669e+01 1.616e+02, threshold=6.317e+01, percent-clipped=3.0 2024-08-11 08:25:44,427 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13650, loss[loss=0.08495, beats_loss=0.01275, ecapa_loss=0.000185, whisper_loss=0.07035, over 15290.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01147, ecapa_loss=0.0002046, whisper_loss=0.09339, over 3850104.68 frames. ], batch size: 59, lr: 8.71e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:25:52,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1006090.0, ans=0.125 2024-08-11 08:26:07,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1006190.0, ans=0.2 2024-08-11 08:26:11,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1006190.0, ans=0.0 2024-08-11 08:26:20,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.18 vs. limit=12.0 2024-08-11 08:26:44,650 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-11 08:26:50,258 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 08:26:51,577 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 08:27:00,991 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13700, loss[loss=0.1046, beats_loss=0.01159, ecapa_loss=0.000255, whisper_loss=0.09051, over 22051.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01156, ecapa_loss=0.0002036, whisper_loss=0.09305, over 3864728.87 frames. ], batch size: 94, lr: 8.71e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:27:17,241 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.90 vs. limit=15.0 2024-08-11 08:27:26,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1006690.0, ans=0.0 2024-08-11 08:27:48,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1006890.0, ans=0.125 2024-08-11 08:27:57,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1006890.0, ans=0.1 2024-08-11 08:28:02,540 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.699e+01 3.024e+01 3.641e+01 8.253e+01, threshold=6.049e+01, percent-clipped=1.0 2024-08-11 08:28:03,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1006990.0, ans=0.2 2024-08-11 08:28:15,846 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13750, loss[loss=0.1038, beats_loss=0.01245, ecapa_loss=0.0002037, whisper_loss=0.0893, over 20473.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01159, ecapa_loss=0.0002051, whisper_loss=0.09255, over 3874264.27 frames. ], batch size: 80, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:28:22,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1007090.0, ans=0.2 2024-08-11 08:28:28,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1007190.0, ans=0.125 2024-08-11 08:28:28,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1007190.0, ans=0.0 2024-08-11 08:29:13,092 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-11 08:29:19,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1007490.0, ans=0.125 2024-08-11 08:29:28,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1007590.0, ans=0.05 2024-08-11 08:29:30,285 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13800, loss[loss=0.09931, beats_loss=0.01063, ecapa_loss=0.0002224, whisper_loss=0.08646, over 15110.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01149, ecapa_loss=0.0002059, whisper_loss=0.09302, over 3879627.82 frames. ], batch size: 55, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:29:38,536 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 08:30:01,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1007790.0, ans=0.0 2024-08-11 08:30:05,941 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 28 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 08:30:15,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1007790.0, ans=0.1 2024-08-11 08:30:34,669 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 08:30:35,994 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.572e+01 2.803e+01 3.077e+01 5.296e+01, threshold=5.605e+01, percent-clipped=0.0 2024-08-11 08:30:36,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1007990.0, ans=0.0 2024-08-11 08:30:37,579 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 08:30:40,324 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.74 vs. limit=22.5 2024-08-11 08:30:44,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1007990.0, ans=0.025 2024-08-11 08:30:47,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1007990.0, ans=0.125 2024-08-11 08:30:49,624 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13850, loss[loss=0.1169, beats_loss=0.01019, ecapa_loss=0.0002267, whisper_loss=0.1044, over 22300.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01139, ecapa_loss=0.0002073, whisper_loss=0.09383, over 3910952.77 frames. ], batch size: 91, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:31:04,699 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 08:31:06,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1008190.0, ans=0.1 2024-08-11 08:31:06,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1008190.0, ans=0.0 2024-08-11 08:31:10,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1008190.0, ans=0.0 2024-08-11 08:31:20,711 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 08:31:30,411 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.56 vs. limit=10.0 2024-08-11 08:31:52,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1008390.0, ans=0.2 2024-08-11 08:32:10,938 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13900, loss[loss=0.09156, beats_loss=0.01234, ecapa_loss=0.0002227, whisper_loss=0.07699, over 21840.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01139, ecapa_loss=0.0002056, whisper_loss=0.09427, over 3912647.37 frames. ], batch size: 92, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:32:11,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1008590.0, ans=0.1 2024-08-11 08:32:15,298 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 08:32:27,958 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-11 08:32:37,287 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.97 vs. limit=6.0 2024-08-11 08:33:12,922 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.06 vs. limit=12.0 2024-08-11 08:33:14,117 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.808e+01 3.104e+01 3.560e+01 5.037e+01, threshold=6.208e+01, percent-clipped=0.0 2024-08-11 08:33:21,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1008990.0, ans=0.5 2024-08-11 08:33:27,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1009090.0, ans=0.1 2024-08-11 08:33:27,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1009090.0, ans=0.125 2024-08-11 08:33:28,062 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 13950, loss[loss=0.09834, beats_loss=0.01432, ecapa_loss=0.0001477, whisper_loss=0.08254, over 17406.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01135, ecapa_loss=0.000204, whisper_loss=0.09485, over 3920507.84 frames. ], batch size: 69, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:34:04,113 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 08:34:06,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1009290.0, ans=0.0 2024-08-11 08:34:28,776 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.33 vs. limit=15.0 2024-08-11 08:34:33,672 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-11 08:34:34,680 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-11 08:34:36,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1009490.0, ans=0.0 2024-08-11 08:34:48,004 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 14000, loss[loss=0.1054, beats_loss=0.01321, ecapa_loss=0.0001569, whisper_loss=0.09059, over 22561.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01133, ecapa_loss=0.0002028, whisper_loss=0.09471, over 3912213.33 frames. ], batch size: 86, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:34:54,911 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 08:35:07,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1009690.0, ans=0.07 2024-08-11 08:35:22,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1009790.0, ans=0.125 2024-08-11 08:35:39,728 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 08:35:41,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1009890.0, ans=0.2 2024-08-11 08:35:56,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1009990.0, ans=0.5 2024-08-11 08:35:57,137 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.710e+01 3.006e+01 3.538e+01 6.784e+01, threshold=6.013e+01, percent-clipped=1.0 2024-08-11 08:36:01,998 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2024-08-11 08:36:12,010 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 14050, loss[loss=0.1053, beats_loss=0.01151, ecapa_loss=0.0001877, whisper_loss=0.09188, over 22735.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01137, ecapa_loss=0.0002027, whisper_loss=0.09404, over 3891364.71 frames. ], batch size: 92, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:36:13,480 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-11 08:36:26,178 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 42 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 08:36:48,186 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.614e+02 2024-08-11 08:37:11,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1010390.0, ans=0.125 2024-08-11 08:37:13,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1010390.0, ans=0.125 2024-08-11 08:37:27,330 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-11 08:37:37,629 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 14100, loss[loss=0.1081, beats_loss=0.008696, ecapa_loss=0.0002371, whisper_loss=0.09708, over 15472.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01128, ecapa_loss=0.0002035, whisper_loss=0.09475, over 3867083.84 frames. ], batch size: 63, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:37:43,135 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 08:38:07,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1010690.0, ans=0.0 2024-08-11 08:38:23,661 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 08:38:49,499 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.624e+01 2.945e+01 3.408e+01 4.744e+01, threshold=5.889e+01, percent-clipped=0.0 2024-08-11 08:39:05,132 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 14150, loss[loss=0.1055, beats_loss=0.01134, ecapa_loss=0.0001881, whisper_loss=0.09224, over 21966.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01133, ecapa_loss=0.000204, whisper_loss=0.09359, over 3839746.92 frames. ], batch size: 85, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:39:26,100 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 08:39:26,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1011190.0, ans=0.07 2024-08-11 08:39:44,150 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-11 08:39:54,248 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 31 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 08:40:26,690 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 11 from Vox, 44 fro AS 2024-08-11 08:40:30,486 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 08:40:30,878 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.684e+00 2024-08-11 08:40:31,576 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 14200, loss[loss=0.1086, beats_loss=0.01055, ecapa_loss=0.0002108, whisper_loss=0.09593, over 15651.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01141, ecapa_loss=0.0002037, whisper_loss=0.09303, over 3872223.25 frames. ], batch size: 61, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:40:38,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1011590.0, ans=0.0 2024-08-11 08:40:40,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1011590.0, ans=0.1 2024-08-11 08:40:54,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1011690.0, ans=0.125 2024-08-11 08:41:10,475 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 08:41:15,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1011790.0, ans=0.04949747468305833 2024-08-11 08:41:15,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1011790.0, ans=0.1 2024-08-11 08:42:08,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1011990.0, ans=0.125 2024-08-11 08:42:12,393 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.734e+01 3.043e+01 3.584e+01 5.331e+01, threshold=6.086e+01, percent-clipped=0.0 2024-08-11 08:42:26,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1011990.0, ans=0.125 2024-08-11 08:42:29,560 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 14250, loss[loss=0.1122, beats_loss=0.01066, ecapa_loss=0.0002281, whisper_loss=0.09923, over 22182.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01143, ecapa_loss=0.0002027, whisper_loss=0.09324, over 3890456.57 frames. ], batch size: 90, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:42:31,483 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 18 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-11 08:43:02,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1012190.0, ans=0.0 2024-08-11 08:43:12,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1012290.0, ans=0.125 2024-08-11 08:43:44,113 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=15.0 2024-08-11 08:43:57,736 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 14300, loss[loss=0.1049, beats_loss=0.01091, ecapa_loss=0.0002057, whisper_loss=0.09198, over 23443.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.0115, ecapa_loss=0.0002017, whisper_loss=0.09306, over 3911681.57 frames. ], batch size: 91, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:44:09,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1012590.0, ans=0.125 2024-08-11 08:44:11,706 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 08:44:12,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1012590.0, ans=0.04949747468305833 2024-08-11 08:44:17,159 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=12.0 2024-08-11 08:44:31,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1012790.0, ans=0.0 2024-08-11 08:44:56,144 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.38 vs. limit=22.5 2024-08-11 08:45:05,095 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.111e+01 2.720e+01 3.044e+01 3.421e+01 5.497e+01, threshold=6.088e+01, percent-clipped=0.0 2024-08-11 08:45:05,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1012990.0, ans=0.125 2024-08-11 08:45:19,359 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 14350, loss[loss=0.1023, beats_loss=0.01306, ecapa_loss=0.0002004, whisper_loss=0.08722, over 19395.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01146, ecapa_loss=0.0002021, whisper_loss=0.09296, over 3906225.82 frames. ], batch size: 76, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:46:15,666 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-11 08:46:15,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1013390.0, ans=0.125 2024-08-11 08:46:18,233 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2024-08-11 08:46:24,094 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-11 08:46:27,461 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 20 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-11 08:46:36,761 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-11 08:46:41,036 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 14400, loss[loss=0.09242, beats_loss=0.01343, ecapa_loss=0.0001828, whisper_loss=0.07716, over 13658.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.0114, ecapa_loss=0.0002035, whisper_loss=0.09267, over 3910151.89 frames. ], batch size: 53, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:46:44,113 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 08:46:56,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1013690.0, ans=0.1 2024-08-11 08:47:02,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1013690.0, ans=0.0 2024-08-11 08:47:15,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.39 vs. limit=15.0 2024-08-11 08:47:23,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1013790.0, ans=0.125 2024-08-11 08:47:46,103 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.704e+01 3.131e+01 3.618e+01 5.413e+01, threshold=6.263e+01, percent-clipped=0.0 2024-08-11 08:48:00,469 INFO [train_multi_KD3.py:1116] (2/4) Epoch 7, batch 14450, loss[loss=0.09751, beats_loss=0.01039, ecapa_loss=0.0002328, whisper_loss=0.08479, over 19870.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01142, ecapa_loss=0.0002043, whisper_loss=0.093, over 3903581.62 frames. ], batch size: 83, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:48:12,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1014090.0, ans=0.1 2024-08-11 08:48:16,752 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 08:48:20,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1014190.0, ans=0.125 2024-08-11 08:48:45,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1014390.0, ans=0.0 2024-08-11 08:49:46,117 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 0, loss[loss=0.07465, beats_loss=0.0106, ecapa_loss=0.0002511, whisper_loss=0.06154, over 14072.00 frames. ], tot_loss[loss=0.07465, beats_loss=0.0106, ecapa_loss=0.0002511, whisper_loss=0.06154, over 14072.00 frames. ], batch size: 59, lr: 8.17e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:49:46,118 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 08:50:29,070 INFO [train_multi_KD3.py:1149] (2/4) Epoch 8, validation on ASR_libri: loss=0.2579, beats_loss=0, ecapa_loss=0.0006499, whisper_loss=0.2514, over 922467.00 frames. 2024-08-11 08:50:45,274 INFO [train_multi_KD3.py:1149] (2/4) Epoch 8, validation on SV_voxceleb1: loss=0.005446, beats_loss=0, ecapa_loss=0.0005446, whisper_loss=0, over 939242.00 frames. 2024-08-11 08:52:49,664 INFO [train_multi_KD3.py:1149] (2/4) Epoch 8, validation on AT_audioset: loss=0.02532, beats_loss=0.02532, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 08:52:49,668 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 08:52:53,654 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 08:52:54,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1014470.0, ans=0.0 2024-08-11 08:53:08,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1014470.0, ans=0.125 2024-08-11 08:53:44,444 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 08:53:46,641 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 08:54:02,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1014670.0, ans=0.125 2024-08-11 08:54:08,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1014670.0, ans=0.0 2024-08-11 08:54:19,292 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 37 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 08:54:32,883 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 08:54:43,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1014870.0, ans=0.0 2024-08-11 08:55:05,094 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 50, loss[loss=0.1058, beats_loss=0.01151, ecapa_loss=0.0002446, whisper_loss=0.09181, over 16396.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01106, ecapa_loss=0.0002073, whisper_loss=0.09289, over 904564.16 frames. ], batch size: 68, lr: 8.17e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:55:06,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1014970.0, ans=0.125 2024-08-11 08:55:12,126 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.434e+01 2.926e+01 3.335e+01 3.829e+01 6.583e+01, threshold=6.671e+01, percent-clipped=1.0 2024-08-11 08:55:20,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1014970.0, ans=0.0 2024-08-11 08:55:34,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1015070.0, ans=0.125 2024-08-11 08:55:42,049 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.99 vs. limit=22.5 2024-08-11 08:55:52,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1015070.0, ans=0.1 2024-08-11 08:56:15,461 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 08:56:30,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-11 08:57:07,104 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 100, loss[loss=0.1295, beats_loss=0.008895, ecapa_loss=0.0002298, whisper_loss=0.1183, over 23810.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01084, ecapa_loss=0.0002069, whisper_loss=0.09363, over 1574190.75 frames. ], batch size: 91, lr: 8.17e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:57:19,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1015470.0, ans=0.125 2024-08-11 08:57:24,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1015470.0, ans=0.035 2024-08-11 08:57:26,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1015470.0, ans=0.1 2024-08-11 08:57:51,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1015570.0, ans=0.2 2024-08-11 08:57:55,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1015670.0, ans=0.125 2024-08-11 08:58:00,119 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.126e+00 2024-08-11 08:58:02,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1015670.0, ans=0.125 2024-08-11 08:58:04,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1015670.0, ans=0.2 2024-08-11 08:58:25,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1015770.0, ans=0.1 2024-08-11 08:58:58,580 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 150, loss[loss=0.1139, beats_loss=0.01143, ecapa_loss=0.0002043, whisper_loss=0.1004, over 15936.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01081, ecapa_loss=0.0002053, whisper_loss=0.09404, over 2060309.18 frames. ], batch size: 61, lr: 8.17e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:58:59,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1015970.0, ans=0.0 2024-08-11 08:59:04,479 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.466e+01 2.999e+01 3.323e+01 3.859e+01 6.934e+01, threshold=6.647e+01, percent-clipped=1.0 2024-08-11 08:59:58,215 INFO [train_multi_KD3.py:844] (2/4) A total of 97 cuts. 26 from LS+wenet, 33 from Vox, 38 fro AS 2024-08-11 09:00:06,798 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 09:00:12,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1016370.0, ans=0.1 2024-08-11 09:00:24,441 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 200, loss[loss=0.1052, beats_loss=0.01142, ecapa_loss=0.0001869, whisper_loss=0.09196, over 17474.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01088, ecapa_loss=0.0002055, whisper_loss=0.09258, over 2449995.70 frames. ], batch size: 68, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:00:33,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1016470.0, ans=0.0 2024-08-11 09:00:47,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1016570.0, ans=0.125 2024-08-11 09:00:52,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1016570.0, ans=0.5 2024-08-11 09:00:53,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=1016570.0, ans=0.5 2024-08-11 09:01:09,518 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2024-08-11 09:01:23,033 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-11 09:01:34,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1016870.0, ans=0.0 2024-08-11 09:01:36,776 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 26 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 09:01:40,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1016870.0, ans=0.2 2024-08-11 09:01:41,410 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 09:01:44,101 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 250, loss[loss=0.08986, beats_loss=0.01302, ecapa_loss=0.0002176, whisper_loss=0.07466, over 20454.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01095, ecapa_loss=0.0002023, whisper_loss=0.09305, over 2741255.90 frames. ], batch size: 87, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:01:48,903 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.577e+01 2.891e+01 3.229e+01 6.128e+01, threshold=5.781e+01, percent-clipped=0.0 2024-08-11 09:02:43,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1017270.0, ans=0.125 2024-08-11 09:02:43,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1017270.0, ans=0.125 2024-08-11 09:03:01,538 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 300, loss[loss=0.09884, beats_loss=0.01503, ecapa_loss=0.0002053, whisper_loss=0.08175, over 22068.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01103, ecapa_loss=0.0002023, whisper_loss=0.0927, over 2978820.22 frames. ], batch size: 90, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:03:08,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1017470.0, ans=0.125 2024-08-11 09:03:11,459 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 09:03:19,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1017570.0, ans=0.2 2024-08-11 09:03:27,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1017570.0, ans=0.0 2024-08-11 09:03:31,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=1017670.0, ans=0.02 2024-08-11 09:03:38,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.25 vs. limit=6.0 2024-08-11 09:03:53,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1017770.0, ans=0.0 2024-08-11 09:04:17,355 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 350, loss[loss=0.1143, beats_loss=0.0114, ecapa_loss=0.0001792, whisper_loss=0.1011, over 17460.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01108, ecapa_loss=0.0002006, whisper_loss=0.09226, over 3167397.98 frames. ], batch size: 70, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:04:22,256 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.490e+01 2.836e+01 3.239e+01 6.329e+01, threshold=5.671e+01, percent-clipped=2.0 2024-08-11 09:04:26,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1017970.0, ans=0.125 2024-08-11 09:04:39,138 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 09:04:50,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1018170.0, ans=0.125 2024-08-11 09:04:54,603 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 09:04:59,156 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-11 09:05:15,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1018270.0, ans=0.1 2024-08-11 09:05:15,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1018270.0, ans=0.125 2024-08-11 09:05:26,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1018370.0, ans=0.1 2024-08-11 09:05:33,323 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 400, loss[loss=0.09665, beats_loss=0.01254, ecapa_loss=0.0001356, whisper_loss=0.08276, over 15529.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01114, ecapa_loss=0.000198, whisper_loss=0.09222, over 3304977.81 frames. ], batch size: 58, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:05:40,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1018470.0, ans=0.0 2024-08-11 09:05:43,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1018470.0, ans=0.125 2024-08-11 09:05:47,734 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2024-08-11 09:05:50,465 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-11 09:06:01,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1018570.0, ans=0.09899494936611666 2024-08-11 09:06:04,174 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 09:06:10,630 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.45 vs. limit=22.5 2024-08-11 09:06:13,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1018670.0, ans=0.0 2024-08-11 09:06:17,597 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.38 vs. limit=22.5 2024-08-11 09:06:20,798 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 09:06:33,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1018770.0, ans=0.125 2024-08-11 09:06:39,261 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-08-11 09:06:40,563 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.622e+00 2024-08-11 09:06:51,073 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 450, loss[loss=0.1379, beats_loss=0.006905, ecapa_loss=0.0002436, whisper_loss=0.1286, over 18228.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01115, ecapa_loss=0.0001985, whisper_loss=0.09223, over 3432136.84 frames. ], batch size: 69, lr: 8.15e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:06:55,191 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.612e+01 2.893e+01 3.369e+01 4.521e+01, threshold=5.785e+01, percent-clipped=0.0 2024-08-11 09:07:42,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1019270.0, ans=0.125 2024-08-11 09:07:43,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1019270.0, ans=0.0 2024-08-11 09:07:50,820 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-11 09:07:54,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1019370.0, ans=0.0 2024-08-11 09:07:55,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1019370.0, ans=0.125 2024-08-11 09:08:09,834 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 500, loss[loss=0.1191, beats_loss=0.00864, ecapa_loss=0.0001844, whisper_loss=0.1086, over 19572.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01111, ecapa_loss=0.0001977, whisper_loss=0.0922, over 3518974.44 frames. ], batch size: 74, lr: 8.15e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:08:12,321 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-11 09:08:25,062 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 09:08:25,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1019570.0, ans=0.125 2024-08-11 09:08:27,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1019570.0, ans=0.125 2024-08-11 09:08:49,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1019670.0, ans=0.125 2024-08-11 09:09:14,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1019870.0, ans=0.0 2024-08-11 09:09:32,028 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2024-08-11 09:09:32,372 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 550, loss[loss=0.1158, beats_loss=0.0119, ecapa_loss=0.0001963, whisper_loss=0.1019, over 22411.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0111, ecapa_loss=0.0001992, whisper_loss=0.09265, over 3604104.59 frames. ], batch size: 88, lr: 8.15e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:09:37,525 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.649e+01 3.106e+01 3.487e+01 7.469e+01, threshold=6.212e+01, percent-clipped=4.0 2024-08-11 09:09:40,567 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 09:10:10,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1020170.0, ans=0.0 2024-08-11 09:10:23,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1020270.0, ans=0.125 2024-08-11 09:10:32,422 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 09:10:34,432 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=22.5 2024-08-11 09:10:47,472 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 600, loss[loss=0.1348, beats_loss=0.008865, ecapa_loss=0.0001906, whisper_loss=0.124, over 17010.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01114, ecapa_loss=0.0001992, whisper_loss=0.0925, over 3653510.12 frames. ], batch size: 64, lr: 8.15e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:10:48,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1020470.0, ans=0.0 2024-08-11 09:11:22,499 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-11 09:11:27,201 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 09:11:33,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1020770.0, ans=0.1 2024-08-11 09:11:38,541 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 09:11:55,954 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 26 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 09:11:57,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1020870.0, ans=0.125 2024-08-11 09:11:59,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1020870.0, ans=0.0 2024-08-11 09:12:01,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1020870.0, ans=0.2 2024-08-11 09:12:05,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1020970.0, ans=0.0 2024-08-11 09:12:06,100 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 650, loss[loss=0.1104, beats_loss=0.01229, ecapa_loss=0.000147, whisper_loss=0.09667, over 23137.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01116, ecapa_loss=0.0001966, whisper_loss=0.0926, over 3683181.97 frames. ], batch size: 89, lr: 8.15e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:12:10,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 2.651e+01 2.850e+01 3.204e+01 4.737e+01, threshold=5.700e+01, percent-clipped=0.0 2024-08-11 09:12:12,943 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-08-11 09:12:14,853 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=15.0 2024-08-11 09:12:21,882 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-11 09:12:23,740 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 26 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-11 09:12:35,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1021170.0, ans=0.0 2024-08-11 09:12:37,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1021170.0, ans=0.125 2024-08-11 09:13:05,135 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.360e+01 2024-08-11 09:13:17,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1021370.0, ans=0.0 2024-08-11 09:13:21,826 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 700, loss[loss=0.116, beats_loss=0.006403, ecapa_loss=0.0002244, whisper_loss=0.1073, over 13359.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01119, ecapa_loss=0.0001958, whisper_loss=0.09275, over 3694458.79 frames. ], batch size: 53, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:13:25,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1021470.0, ans=0.125 2024-08-11 09:13:27,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1021470.0, ans=0.125 2024-08-11 09:13:34,704 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-11 09:13:35,412 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.69 vs. limit=22.5 2024-08-11 09:13:46,761 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 09:14:01,256 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.92 vs. limit=6.0 2024-08-11 09:14:05,360 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.15 vs. limit=6.0 2024-08-11 09:14:07,785 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 09:14:10,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1021770.0, ans=0.1 2024-08-11 09:14:12,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1021770.0, ans=0.2 2024-08-11 09:14:37,669 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 750, loss[loss=0.1102, beats_loss=0.009591, ecapa_loss=0.0002267, whisper_loss=0.09834, over 22903.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01113, ecapa_loss=0.0001942, whisper_loss=0.09336, over 3740928.15 frames. ], batch size: 91, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:14:42,539 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.660e+01 3.127e+01 3.627e+01 6.783e+01, threshold=6.254e+01, percent-clipped=6.0 2024-08-11 09:14:44,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1021970.0, ans=0.1 2024-08-11 09:14:50,400 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 24 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-11 09:15:08,790 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-11 09:15:15,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1022170.0, ans=0.125 2024-08-11 09:15:44,882 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2024-08-11 09:15:48,651 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 09:15:48,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1022370.0, ans=0.0 2024-08-11 09:15:48,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1022370.0, ans=0.0 2024-08-11 09:15:54,545 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 800, loss[loss=0.108, beats_loss=0.01277, ecapa_loss=0.0001818, whisper_loss=0.09345, over 21345.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01115, ecapa_loss=0.0001954, whisper_loss=0.09343, over 3792329.71 frames. ], batch size: 86, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:15:55,776 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2024-08-11 09:15:58,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1022470.0, ans=0.1 2024-08-11 09:16:05,660 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.99 vs. limit=6.0 2024-08-11 09:16:08,965 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-08-11 09:16:09,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1022570.0, ans=15.0 2024-08-11 09:16:09,676 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-11 09:16:14,092 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 8 from Vox, 29 fro AS 2024-08-11 09:16:24,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1022670.0, ans=0.0 2024-08-11 09:16:30,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1022670.0, ans=0.1 2024-08-11 09:16:40,048 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 13 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 09:17:07,225 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 850, loss[loss=0.1237, beats_loss=0.007311, ecapa_loss=0.0002714, whisper_loss=0.1137, over 19821.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0112, ecapa_loss=0.0001951, whisper_loss=0.09243, over 3771834.79 frames. ], batch size: 84, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:17:11,444 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.661e+01 2.916e+01 3.361e+01 8.910e+01, threshold=5.831e+01, percent-clipped=1.0 2024-08-11 09:17:13,209 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-11 09:17:14,503 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 09:17:16,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1022970.0, ans=0.2 2024-08-11 09:17:23,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1023070.0, ans=0.125 2024-08-11 09:17:37,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1023170.0, ans=0.125 2024-08-11 09:17:37,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1023170.0, ans=0.0 2024-08-11 09:17:55,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1023270.0, ans=0.125 2024-08-11 09:18:04,756 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 09:18:06,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1023370.0, ans=0.0 2024-08-11 09:18:15,439 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 36 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-11 09:18:21,903 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 900, loss[loss=0.1156, beats_loss=0.01067, ecapa_loss=0.0001891, whisper_loss=0.103, over 22389.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01125, ecapa_loss=0.0001946, whisper_loss=0.09176, over 3762260.48 frames. ], batch size: 86, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:18:32,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1023470.0, ans=0.0 2024-08-11 09:18:37,915 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-11 09:18:38,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1023570.0, ans=0.125 2024-08-11 09:18:42,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1023570.0, ans=0.125 2024-08-11 09:18:48,894 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 09:18:53,018 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.72 vs. limit=10.0 2024-08-11 09:19:00,625 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 27 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-11 09:19:00,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1023670.0, ans=0.125 2024-08-11 09:19:09,940 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 09:19:14,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1023770.0, ans=0.1 2024-08-11 09:19:16,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1023770.0, ans=0.2 2024-08-11 09:19:16,990 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 20 from Vox, 15 fro AS 2024-08-11 09:19:23,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1023870.0, ans=0.125 2024-08-11 09:19:26,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1023870.0, ans=0.125 2024-08-11 09:19:34,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1023870.0, ans=0.125 2024-08-11 09:19:36,596 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 950, loss[loss=0.1242, beats_loss=0.008706, ecapa_loss=0.0001908, whisper_loss=0.1136, over 21263.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01116, ecapa_loss=0.0001946, whisper_loss=0.09266, over 3768178.90 frames. ], batch size: 81, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:19:37,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1023970.0, ans=0.125 2024-08-11 09:19:40,288 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.622e+01 2.876e+01 3.425e+01 6.209e+01, threshold=5.753e+01, percent-clipped=1.0 2024-08-11 09:19:40,547 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 19 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-11 09:19:43,119 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-11 09:19:46,850 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 09:19:47,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1023970.0, ans=0.1 2024-08-11 09:19:58,165 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 09:19:58,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1024070.0, ans=0.2 2024-08-11 09:20:27,812 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 09:20:36,001 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=8.163e-03 2024-08-11 09:20:49,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1024370.0, ans=0.2 2024-08-11 09:21:00,729 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1000, loss[loss=0.1142, beats_loss=0.01233, ecapa_loss=0.0001409, whisper_loss=0.1004, over 23987.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01119, ecapa_loss=0.0001942, whisper_loss=0.092, over 3738706.83 frames. ], batch size: 92, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:21:28,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1024570.0, ans=0.2 2024-08-11 09:21:36,120 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2024-08-11 09:21:42,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1024670.0, ans=0.07 2024-08-11 09:22:00,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1024770.0, ans=0.0 2024-08-11 09:22:02,076 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 35 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-11 09:22:03,615 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 09:22:06,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1024770.0, ans=0.07 2024-08-11 09:22:09,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1024770.0, ans=0.125 2024-08-11 09:22:11,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1024870.0, ans=0.1 2024-08-11 09:22:32,375 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1050, loss[loss=0.08929, beats_loss=0.01405, ecapa_loss=0.0001243, whisper_loss=0.074, over 16436.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0112, ecapa_loss=0.0001943, whisper_loss=0.09259, over 3752455.44 frames. ], batch size: 66, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:22:36,242 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 35 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 09:22:39,275 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.754e+01 3.061e+01 3.548e+01 9.955e+01, threshold=6.122e+01, percent-clipped=1.0 2024-08-11 09:22:42,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1024970.0, ans=0.2 2024-08-11 09:23:13,897 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 09:23:24,321 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 09:23:28,221 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.42 vs. limit=10.0 2024-08-11 09:23:29,748 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2024-08-11 09:23:58,303 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 09:24:04,980 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 09:24:21,618 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1100, loss[loss=0.1402, beats_loss=0.00993, ecapa_loss=0.0001975, whisper_loss=0.1282, over 22588.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01122, ecapa_loss=0.0001931, whisper_loss=0.09325, over 3778448.30 frames. ], batch size: 88, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:24:22,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1025470.0, ans=0.125 2024-08-11 09:25:19,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1025670.0, ans=0.1 2024-08-11 09:25:19,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1025670.0, ans=0.0 2024-08-11 09:25:24,913 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2024-08-11 09:25:29,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1025770.0, ans=0.125 2024-08-11 09:25:35,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1025770.0, ans=0.125 2024-08-11 09:25:38,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1025770.0, ans=0.0 2024-08-11 09:25:48,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1025870.0, ans=0.125 2024-08-11 09:26:02,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1025870.0, ans=0.1 2024-08-11 09:26:07,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1025970.0, ans=0.2 2024-08-11 09:26:08,731 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1150, loss[loss=0.1066, beats_loss=0.01077, ecapa_loss=0.0001957, whisper_loss=0.09392, over 20827.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01121, ecapa_loss=0.0001929, whisper_loss=0.09308, over 3781108.19 frames. ], batch size: 83, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:26:14,340 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.696e+01 3.045e+01 3.408e+01 7.482e+01, threshold=6.090e+01, percent-clipped=2.0 2024-08-11 09:26:33,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1026070.0, ans=0.1 2024-08-11 09:26:41,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1026070.0, ans=0.2 2024-08-11 09:26:49,900 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-11 09:26:59,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1026170.0, ans=0.0 2024-08-11 09:27:09,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.88 vs. limit=22.5 2024-08-11 09:27:18,673 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-11 09:27:23,685 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.05 vs. limit=22.5 2024-08-11 09:27:36,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1026370.0, ans=0.09899494936611666 2024-08-11 09:27:41,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1026370.0, ans=0.125 2024-08-11 09:27:45,368 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.556e-02 2024-08-11 09:27:47,068 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 09:27:54,688 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1200, loss[loss=0.1006, beats_loss=0.01396, ecapa_loss=0.0001379, whisper_loss=0.0853, over 16259.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01124, ecapa_loss=0.0001912, whisper_loss=0.09336, over 3785209.18 frames. ], batch size: 60, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:28:02,298 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.32 vs. limit=15.0 2024-08-11 09:28:07,862 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-11 09:28:10,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1026470.0, ans=0.0 2024-08-11 09:28:14,466 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-11 09:28:27,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1026670.0, ans=0.125 2024-08-11 09:28:34,793 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 09:28:36,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1026670.0, ans=0.1 2024-08-11 09:28:39,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.30 vs. limit=15.0 2024-08-11 09:28:51,092 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=12.0 2024-08-11 09:28:52,300 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 09:28:59,327 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 09:29:05,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1026870.0, ans=0.125 2024-08-11 09:29:08,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1026870.0, ans=0.125 2024-08-11 09:29:10,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1026870.0, ans=0.125 2024-08-11 09:29:12,872 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1250, loss[loss=0.1068, beats_loss=0.01062, ecapa_loss=0.0001749, whisper_loss=0.09448, over 20875.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01127, ecapa_loss=0.0001931, whisper_loss=0.0926, over 3819274.05 frames. ], batch size: 81, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:29:17,175 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.549e+01 2.780e+01 3.273e+01 6.263e+01, threshold=5.560e+01, percent-clipped=1.0 2024-08-11 09:29:23,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1026970.0, ans=0.125 2024-08-11 09:29:29,362 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 09:29:40,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1027070.0, ans=0.5 2024-08-11 09:29:54,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1027170.0, ans=0.125 2024-08-11 09:30:06,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1027270.0, ans=0.125 2024-08-11 09:30:11,960 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 09:30:14,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1027370.0, ans=10.0 2024-08-11 09:30:27,284 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1300, loss[loss=0.1202, beats_loss=0.01217, ecapa_loss=0.0001788, whisper_loss=0.1062, over 22843.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01131, ecapa_loss=0.0001924, whisper_loss=0.09242, over 3816379.99 frames. ], batch size: 89, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:30:29,412 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-11 09:30:30,789 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 09:30:42,080 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 09:30:44,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1027570.0, ans=0.0 2024-08-11 09:30:46,501 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.23 vs. limit=15.0 2024-08-11 09:31:05,036 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 09:31:05,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1027670.0, ans=0.125 2024-08-11 09:31:44,816 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1350, loss[loss=0.09001, beats_loss=0.0113, ecapa_loss=0.0001764, whisper_loss=0.07695, over 17864.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01132, ecapa_loss=0.0001925, whisper_loss=0.09228, over 3844059.42 frames. ], batch size: 69, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:31:47,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1027970.0, ans=0.07 2024-08-11 09:31:49,393 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.558e+01 2.922e+01 3.559e+01 4.960e+01, threshold=5.843e+01, percent-clipped=0.0 2024-08-11 09:31:50,426 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 34 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 09:31:50,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1027970.0, ans=0.0 2024-08-11 09:31:56,332 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-11 09:32:05,969 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=15.0 2024-08-11 09:32:18,022 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.01 vs. limit=12.0 2024-08-11 09:32:20,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1028170.0, ans=0.125 2024-08-11 09:32:25,127 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2024-08-11 09:32:43,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1028370.0, ans=0.035 2024-08-11 09:32:45,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1028370.0, ans=0.0 2024-08-11 09:32:48,576 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 31 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 09:32:50,151 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 09:32:51,421 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 09:32:56,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1028370.0, ans=0.0 2024-08-11 09:32:57,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1028370.0, ans=0.5 2024-08-11 09:32:59,717 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1400, loss[loss=0.1005, beats_loss=0.011, ecapa_loss=0.0001779, whisper_loss=0.0877, over 17866.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01125, ecapa_loss=0.0001929, whisper_loss=0.09221, over 3840812.95 frames. ], batch size: 68, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:33:01,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1028470.0, ans=0.0 2024-08-11 09:33:17,238 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2024-08-11 09:33:28,313 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-11 09:33:55,027 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 09:34:02,532 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 09:34:04,456 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-11 09:34:10,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1028870.0, ans=0.1 2024-08-11 09:34:11,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1028870.0, ans=0.0 2024-08-11 09:34:28,053 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1450, loss[loss=0.07606, beats_loss=0.01415, ecapa_loss=0.0001936, whisper_loss=0.05998, over 20360.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01129, ecapa_loss=0.0001922, whisper_loss=0.09183, over 3840281.77 frames. ], batch size: 88, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:34:33,022 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.516e+01 2.871e+01 3.149e+01 4.386e+01, threshold=5.743e+01, percent-clipped=0.0 2024-08-11 09:34:36,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1028970.0, ans=0.2 2024-08-11 09:34:46,868 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-11 09:34:56,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1029070.0, ans=10.0 2024-08-11 09:34:57,725 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 09:35:02,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1029170.0, ans=0.125 2024-08-11 09:35:12,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1029170.0, ans=0.125 2024-08-11 09:35:29,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1029270.0, ans=0.125 2024-08-11 09:35:30,619 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 09:35:48,020 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1500, loss[loss=0.1026, beats_loss=0.01249, ecapa_loss=0.0001832, whisper_loss=0.08831, over 23297.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01129, ecapa_loss=0.0001911, whisper_loss=0.09107, over 3806124.74 frames. ], batch size: 93, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:35:52,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1029470.0, ans=0.0 2024-08-11 09:36:08,344 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 09:36:11,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1029570.0, ans=0.0 2024-08-11 09:36:24,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1029670.0, ans=0.2 2024-08-11 09:36:33,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1029670.0, ans=0.125 2024-08-11 09:36:42,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1029770.0, ans=0.125 2024-08-11 09:36:56,106 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 09:37:07,489 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1550, loss[loss=0.07181, beats_loss=0.01272, ecapa_loss=0.0001433, whisper_loss=0.05766, over 17984.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01125, ecapa_loss=0.0001901, whisper_loss=0.09134, over 3808250.64 frames. ], batch size: 70, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:37:11,890 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.727e+01 2.976e+01 3.507e+01 6.642e+01, threshold=5.952e+01, percent-clipped=2.0 2024-08-11 09:37:21,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1029970.0, ans=0.1 2024-08-11 09:37:24,790 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.33 vs. limit=6.0 2024-08-11 09:37:41,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1030170.0, ans=0.125 2024-08-11 09:38:00,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1030270.0, ans=0.125 2024-08-11 09:38:08,195 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 09:38:08,483 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 09:38:22,701 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=21.25 vs. limit=22.5 2024-08-11 09:38:26,105 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1600, loss[loss=0.1012, beats_loss=0.01357, ecapa_loss=0.0001798, whisper_loss=0.08586, over 22952.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01121, ecapa_loss=0.0001903, whisper_loss=0.09226, over 3841209.34 frames. ], batch size: 93, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:38:55,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1030570.0, ans=0.125 2024-08-11 09:38:57,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1030670.0, ans=0.0 2024-08-11 09:39:04,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1030670.0, ans=0.125 2024-08-11 09:39:17,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1030770.0, ans=0.1 2024-08-11 09:39:25,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1030770.0, ans=0.125 2024-08-11 09:39:39,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1030870.0, ans=0.125 2024-08-11 09:39:39,784 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=15.0 2024-08-11 09:39:42,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1030970.0, ans=0.125 2024-08-11 09:39:43,744 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1650, loss[loss=0.07636, beats_loss=0.01269, ecapa_loss=0.0001601, whisper_loss=0.06207, over 18286.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01119, ecapa_loss=0.0001894, whisper_loss=0.09238, over 3848756.58 frames. ], batch size: 69, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:39:48,495 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.611e+01 2.904e+01 3.448e+01 5.228e+01, threshold=5.808e+01, percent-clipped=0.0 2024-08-11 09:39:58,204 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2024-08-11 09:40:03,215 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 09:40:38,417 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 09:40:51,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1031370.0, ans=0.125 2024-08-11 09:40:57,851 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1700, loss[loss=0.07431, beats_loss=0.01474, ecapa_loss=0.0001789, whisper_loss=0.05778, over 18996.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01114, ecapa_loss=0.0001906, whisper_loss=0.09277, over 3835753.29 frames. ], batch size: 81, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:41:11,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1031570.0, ans=0.125 2024-08-11 09:41:11,656 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.14 vs. limit=15.0 2024-08-11 09:41:13,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1031570.0, ans=0.2 2024-08-11 09:41:13,959 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 09:41:17,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1031570.0, ans=0.0 2024-08-11 09:41:35,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1031670.0, ans=0.0 2024-08-11 09:41:39,887 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.62 vs. limit=15.0 2024-08-11 09:41:40,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1031770.0, ans=0.2 2024-08-11 09:41:55,508 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-11 09:41:56,767 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 09:42:09,264 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1750, loss[loss=0.1066, beats_loss=0.009403, ecapa_loss=0.0001767, whisper_loss=0.09541, over 18798.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01112, ecapa_loss=0.0001904, whisper_loss=0.09224, over 3808535.79 frames. ], batch size: 72, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:42:11,062 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-11 09:42:13,428 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.183e+01 2.694e+01 3.096e+01 3.648e+01 5.495e+01, threshold=6.193e+01, percent-clipped=0.0 2024-08-11 09:43:08,274 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.04 vs. limit=15.0 2024-08-11 09:43:17,379 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 09:43:20,061 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 09:43:21,907 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1800, loss[loss=0.1178, beats_loss=0.009412, ecapa_loss=0.0001946, whisper_loss=0.1064, over 15315.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01113, ecapa_loss=0.0001911, whisper_loss=0.09158, over 3788772.42 frames. ], batch size: 58, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:43:29,581 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 09:43:31,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1032470.0, ans=0.125 2024-08-11 09:44:26,168 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 9 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 09:44:27,245 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2024-08-11 09:44:28,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1032870.0, ans=0.5 2024-08-11 09:44:32,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1032870.0, ans=0.125 2024-08-11 09:44:35,054 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1850, loss[loss=0.1098, beats_loss=0.009986, ecapa_loss=0.0002012, whisper_loss=0.09782, over 17347.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01107, ecapa_loss=0.0001906, whisper_loss=0.09218, over 3784513.04 frames. ], batch size: 68, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:44:36,692 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 09:44:39,689 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.564e+01 2.931e+01 3.381e+01 4.621e+01, threshold=5.861e+01, percent-clipped=0.0 2024-08-11 09:44:41,380 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-11 09:44:47,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1032970.0, ans=0.125 2024-08-11 09:44:51,310 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 09:44:53,017 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 09:44:54,299 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-11 09:44:57,022 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 09:44:57,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1033070.0, ans=0.1 2024-08-11 09:45:20,500 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.40 vs. limit=15.0 2024-08-11 09:45:37,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1033370.0, ans=0.0 2024-08-11 09:45:39,439 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.95 vs. limit=15.0 2024-08-11 09:45:47,079 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1900, loss[loss=0.08671, beats_loss=0.01265, ecapa_loss=0.0002109, whisper_loss=0.07196, over 21466.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01117, ecapa_loss=0.0001933, whisper_loss=0.09201, over 3808441.24 frames. ], batch size: 91, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:45:56,062 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 9 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 09:45:56,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1033470.0, ans=0.125 2024-08-11 09:46:02,216 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 09:46:03,439 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 09:46:05,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1033570.0, ans=0.125 2024-08-11 09:46:07,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1033570.0, ans=0.1 2024-08-11 09:46:30,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1033770.0, ans=0.025 2024-08-11 09:46:31,641 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-08-11 09:46:34,256 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 09:46:40,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1033770.0, ans=0.09899494936611666 2024-08-11 09:46:46,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1033870.0, ans=0.2 2024-08-11 09:46:58,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.48 vs. limit=15.0 2024-08-11 09:47:00,650 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 1950, loss[loss=0.08987, beats_loss=0.009898, ecapa_loss=0.0001841, whisper_loss=0.07813, over 16051.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01114, ecapa_loss=0.0001964, whisper_loss=0.09253, over 3785121.51 frames. ], batch size: 60, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:47:02,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1033970.0, ans=0.125 2024-08-11 09:47:05,029 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.682e+01 2.998e+01 3.589e+01 5.098e+01, threshold=5.997e+01, percent-clipped=0.0 2024-08-11 09:47:06,161 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=15.0 2024-08-11 09:47:20,053 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-08-11 09:47:26,945 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2024-08-11 09:47:37,632 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.37 vs. limit=15.0 2024-08-11 09:47:47,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1034270.0, ans=0.0 2024-08-11 09:48:11,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1034370.0, ans=0.2 2024-08-11 09:48:13,526 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2000, loss[loss=0.07819, beats_loss=0.01089, ecapa_loss=0.0002094, whisper_loss=0.0652, over 17375.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01126, ecapa_loss=0.0001971, whisper_loss=0.09222, over 3812646.01 frames. ], batch size: 68, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:48:17,760 WARNING [optim.py:496] (2/4) Scaling gradients by 0.059571195393800735, model_norm_threshold=59.96577072143555 2024-08-11 09:48:17,968 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.97, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.877e+05, grad_sumsq=1.108e+05, orig_rms_sq=8.917e+00 2024-08-11 09:48:51,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1034670.0, ans=0.125 2024-08-11 09:48:54,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1034670.0, ans=0.2 2024-08-11 09:48:57,444 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2024-08-11 09:48:57,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=1034770.0, ans=10.0 2024-08-11 09:49:27,840 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2050, loss[loss=0.1136, beats_loss=0.01233, ecapa_loss=0.0001723, whisper_loss=0.09953, over 19045.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01123, ecapa_loss=0.0001966, whisper_loss=0.09268, over 3822104.36 frames. ], batch size: 72, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:49:30,768 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 14 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 09:49:31,758 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.681e+01 2.944e+01 3.350e+01 1.007e+03, threshold=5.888e+01, percent-clipped=2.0 2024-08-11 09:49:40,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1034970.0, ans=0.125 2024-08-11 09:49:41,345 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 09:49:47,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1035070.0, ans=0.125 2024-08-11 09:50:04,060 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-11 09:50:22,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1035270.0, ans=15.0 2024-08-11 09:50:29,203 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 09:50:40,722 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2100, loss[loss=0.1212, beats_loss=0.01086, ecapa_loss=0.0002066, whisper_loss=0.1082, over 23221.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01129, ecapa_loss=0.000195, whisper_loss=0.09271, over 3809358.85 frames. ], batch size: 93, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:50:55,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1035570.0, ans=0.125 2024-08-11 09:51:07,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1035570.0, ans=0.125 2024-08-11 09:51:11,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1035670.0, ans=0.05 2024-08-11 09:51:17,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1035670.0, ans=0.0 2024-08-11 09:51:22,742 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 09:51:25,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1035770.0, ans=0.0 2024-08-11 09:51:29,964 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 09:51:32,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1035770.0, ans=0.0 2024-08-11 09:51:37,436 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 09:51:38,264 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.34 vs. limit=15.0 2024-08-11 09:51:54,028 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2150, loss[loss=0.09311, beats_loss=0.01321, ecapa_loss=0.0001448, whisper_loss=0.07845, over 18561.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01136, ecapa_loss=0.0001951, whisper_loss=0.09249, over 3806655.98 frames. ], batch size: 70, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:51:58,116 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.546e+01 2.848e+01 3.381e+01 6.507e+01, threshold=5.695e+01, percent-clipped=3.0 2024-08-11 09:52:01,748 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 09:52:23,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1036170.0, ans=0.2 2024-08-11 09:52:38,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1036270.0, ans=0.2 2024-08-11 09:52:44,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1036270.0, ans=0.1 2024-08-11 09:52:46,179 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2024-08-11 09:52:46,867 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 38 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 09:52:48,154 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.05 vs. limit=15.0 2024-08-11 09:53:06,858 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2200, loss[loss=0.112, beats_loss=0.0122, ecapa_loss=0.0001823, whisper_loss=0.09801, over 23574.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01135, ecapa_loss=0.0001953, whisper_loss=0.09255, over 3785820.59 frames. ], batch size: 93, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:53:09,800 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 09:53:10,502 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.68 vs. limit=22.5 2024-08-11 09:53:21,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1036570.0, ans=0.05 2024-08-11 09:53:24,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1036570.0, ans=0.125 2024-08-11 09:53:27,069 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-11 09:53:42,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1036670.0, ans=0.5 2024-08-11 09:53:56,330 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 09:53:57,648 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-11 09:54:10,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1036870.0, ans=0.125 2024-08-11 09:54:15,538 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2250, loss[loss=0.1279, beats_loss=0.009144, ecapa_loss=0.0002102, whisper_loss=0.1166, over 20458.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.0114, ecapa_loss=0.000196, whisper_loss=0.09327, over 3800935.66 frames. ], batch size: 81, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:54:19,421 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.681e+01 2.914e+01 3.367e+01 5.391e+01, threshold=5.828e+01, percent-clipped=0.0 2024-08-11 09:54:31,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1037070.0, ans=0.5 2024-08-11 09:54:53,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1037170.0, ans=0.2 2024-08-11 09:55:03,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1037270.0, ans=0.0 2024-08-11 09:55:04,691 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-11 09:55:13,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1037370.0, ans=0.125 2024-08-11 09:55:21,545 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2300, loss[loss=0.1199, beats_loss=0.01167, ecapa_loss=0.0001834, whisper_loss=0.1064, over 22422.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01143, ecapa_loss=0.0001976, whisper_loss=0.09288, over 3839040.71 frames. ], batch size: 89, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:55:23,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1037470.0, ans=0.125 2024-08-11 09:55:24,341 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 13 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 09:55:24,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1037470.0, ans=0.0 2024-08-11 09:55:31,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1037470.0, ans=0.125 2024-08-11 09:55:32,157 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.03 vs. limit=6.0 2024-08-11 09:55:56,844 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-11 09:56:02,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1037770.0, ans=0.1 2024-08-11 09:56:06,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1037770.0, ans=0.0 2024-08-11 09:56:14,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1037870.0, ans=0.125 2024-08-11 09:56:27,145 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2350, loss[loss=0.1119, beats_loss=0.01255, ecapa_loss=0.000237, whisper_loss=0.09694, over 18186.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0113, ecapa_loss=0.0001994, whisper_loss=0.09316, over 3804478.35 frames. ], batch size: 76, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:56:27,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1037970.0, ans=0.0 2024-08-11 09:56:29,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1037970.0, ans=0.1 2024-08-11 09:56:30,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1037970.0, ans=0.125 2024-08-11 09:56:31,682 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.661e+01 3.016e+01 3.402e+01 1.211e+02, threshold=6.032e+01, percent-clipped=3.0 2024-08-11 09:56:32,453 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2024-08-11 09:56:39,703 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 09:56:42,459 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 09:56:55,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1038170.0, ans=0.125 2024-08-11 09:57:16,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1038270.0, ans=0.125 2024-08-11 09:57:18,971 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 09:57:33,245 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2400, loss[loss=0.09724, beats_loss=0.01302, ecapa_loss=0.0001891, whisper_loss=0.08233, over 16880.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01121, ecapa_loss=0.0001992, whisper_loss=0.09356, over 3818616.02 frames. ], batch size: 66, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:57:41,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1038470.0, ans=0.09899494936611666 2024-08-11 09:57:42,986 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.35 vs. limit=6.0 2024-08-11 09:57:47,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1038570.0, ans=0.0 2024-08-11 09:58:28,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1038870.0, ans=0.125 2024-08-11 09:58:33,813 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 09:58:34,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1038870.0, ans=0.2 2024-08-11 09:58:39,137 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2450, loss[loss=0.09984, beats_loss=0.01367, ecapa_loss=0.0001722, whisper_loss=0.08445, over 14259.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01128, ecapa_loss=0.0001977, whisper_loss=0.09392, over 3834763.52 frames. ], batch size: 57, lr: 8.07e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:58:41,860 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 09:58:43,031 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.701e+01 2.979e+01 3.423e+01 5.204e+01, threshold=5.958e+01, percent-clipped=0.0 2024-08-11 09:58:44,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1038970.0, ans=0.0 2024-08-11 09:58:51,230 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 09:58:51,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1039070.0, ans=0.125 2024-08-11 09:58:55,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1039070.0, ans=0.1 2024-08-11 09:58:56,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1039070.0, ans=0.125 2024-08-11 09:58:56,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1039070.0, ans=0.0 2024-08-11 09:59:34,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1039370.0, ans=0.0 2024-08-11 09:59:44,195 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2500, loss[loss=0.1003, beats_loss=0.01269, ecapa_loss=0.0001741, whisper_loss=0.08587, over 18835.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01125, ecapa_loss=0.0001991, whisper_loss=0.09366, over 3838085.45 frames. ], batch size: 77, lr: 8.07e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:59:46,282 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.91 vs. limit=15.0 2024-08-11 10:00:12,108 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 31 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 10:00:13,380 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 10:00:14,953 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.619e-02 2024-08-11 10:00:25,654 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=22.5 2024-08-11 10:00:26,170 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-11 10:00:38,336 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 10:00:38,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1039870.0, ans=0.125 2024-08-11 10:00:42,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1039870.0, ans=0.125 2024-08-11 10:00:49,851 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2550, loss[loss=0.08272, beats_loss=0.01417, ecapa_loss=0.000216, whisper_loss=0.0664, over 16657.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01125, ecapa_loss=0.000199, whisper_loss=0.09439, over 3874017.02 frames. ], batch size: 71, lr: 8.07e-03, grad_scale: 7.205759403792794e+16 2024-08-11 10:00:56,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1039970.0, ans=0.2 2024-08-11 10:00:57,242 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.767e+01 3.292e+01 3.693e+01 5.376e+01, threshold=6.584e+01, percent-clipped=0.0 2024-08-11 10:01:06,303 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 10:01:07,235 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.40 vs. limit=10.0 2024-08-11 10:01:15,126 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.39 vs. limit=22.5 2024-08-11 10:01:24,036 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.78 vs. limit=10.0 2024-08-11 10:01:27,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1040170.0, ans=0.2 2024-08-11 10:01:30,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1040170.0, ans=0.125 2024-08-11 10:01:34,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1040270.0, ans=0.2 2024-08-11 10:01:43,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1040270.0, ans=0.05 2024-08-11 10:01:59,626 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2600, loss[loss=0.1075, beats_loss=0.01284, ecapa_loss=0.0001881, whisper_loss=0.09276, over 20457.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01139, ecapa_loss=0.0001989, whisper_loss=0.09363, over 3870570.92 frames. ], batch size: 82, lr: 8.07e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:02:19,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1040570.0, ans=0.0 2024-08-11 10:02:22,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=1040570.0, ans=12.0 2024-08-11 10:02:31,162 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-11 10:02:33,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1040670.0, ans=0.125 2024-08-11 10:02:38,945 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.12 vs. limit=10.0 2024-08-11 10:03:01,202 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.774e+00 2024-08-11 10:03:05,884 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2650, loss[loss=0.1009, beats_loss=0.01103, ecapa_loss=0.000171, whisper_loss=0.08818, over 22867.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01136, ecapa_loss=0.0001991, whisper_loss=0.09343, over 3895243.35 frames. ], batch size: 90, lr: 8.07e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:03:09,676 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.707e+01 2.925e+01 3.318e+01 6.568e+01, threshold=5.849e+01, percent-clipped=0.0 2024-08-11 10:03:19,577 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.06 vs. limit=22.5 2024-08-11 10:03:28,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1041070.0, ans=0.1 2024-08-11 10:03:31,847 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 10:03:40,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1041170.0, ans=0.125 2024-08-11 10:03:40,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1041170.0, ans=0.125 2024-08-11 10:03:45,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1041270.0, ans=0.09899494936611666 2024-08-11 10:03:54,759 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 10:04:11,338 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2700, loss[loss=0.09906, beats_loss=0.01437, ecapa_loss=0.0002041, whisper_loss=0.08265, over 20976.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01142, ecapa_loss=0.0001998, whisper_loss=0.09225, over 3901194.22 frames. ], batch size: 87, lr: 8.07e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:04:30,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1041570.0, ans=0.0 2024-08-11 10:04:41,009 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 10:04:45,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1041670.0, ans=0.0 2024-08-11 10:04:46,763 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.89 vs. limit=6.0 2024-08-11 10:05:02,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1041770.0, ans=0.125 2024-08-11 10:05:14,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1041870.0, ans=0.125 2024-08-11 10:05:18,106 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2750, loss[loss=0.09378, beats_loss=0.01046, ecapa_loss=0.0001781, whisper_loss=0.08154, over 18198.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01142, ecapa_loss=0.000199, whisper_loss=0.09241, over 3876736.12 frames. ], batch size: 70, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:05:21,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1041970.0, ans=0.0 2024-08-11 10:05:22,013 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.665e+01 2.980e+01 3.281e+01 5.234e+01, threshold=5.959e+01, percent-clipped=0.0 2024-08-11 10:05:24,920 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 10:05:38,123 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 10:05:43,303 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 10:05:52,810 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 10:06:03,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1042270.0, ans=0.125 2024-08-11 10:06:08,559 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 10:06:15,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1042370.0, ans=0.015 2024-08-11 10:06:24,096 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2800, loss[loss=0.1016, beats_loss=0.009858, ecapa_loss=0.0001911, whisper_loss=0.08983, over 22063.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01134, ecapa_loss=0.0002002, whisper_loss=0.09285, over 3841461.64 frames. ], batch size: 89, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:06:34,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1042470.0, ans=0.0 2024-08-11 10:06:48,888 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 10:06:54,329 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-11 10:06:57,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1042670.0, ans=0.025 2024-08-11 10:06:58,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1042670.0, ans=0.125 2024-08-11 10:07:13,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1042770.0, ans=0.0 2024-08-11 10:07:18,303 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 36 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-11 10:07:25,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1042870.0, ans=0.2 2024-08-11 10:07:29,597 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2850, loss[loss=0.09927, beats_loss=0.0144, ecapa_loss=0.0001592, whisper_loss=0.08328, over 22780.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01144, ecapa_loss=0.0001984, whisper_loss=0.09288, over 3861124.29 frames. ], batch size: 90, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:07:33,524 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.749e+01 2.990e+01 3.438e+01 5.063e+01, threshold=5.981e+01, percent-clipped=0.0 2024-08-11 10:07:37,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1042970.0, ans=0.0 2024-08-11 10:07:41,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1043070.0, ans=0.0 2024-08-11 10:07:49,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1043070.0, ans=0.125 2024-08-11 10:07:51,112 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 10:08:01,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1043170.0, ans=0.0 2024-08-11 10:08:09,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1043270.0, ans=0.0 2024-08-11 10:08:20,070 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 10:08:31,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1043370.0, ans=0.1 2024-08-11 10:08:33,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-11 10:08:35,673 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2900, loss[loss=0.1113, beats_loss=0.009887, ecapa_loss=0.0001856, whisper_loss=0.09952, over 17990.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0114, ecapa_loss=0.0001999, whisper_loss=0.09257, over 3872541.24 frames. ], batch size: 67, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:08:36,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1043470.0, ans=0.125 2024-08-11 10:08:40,300 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 10:08:53,875 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.49 vs. limit=22.5 2024-08-11 10:08:55,885 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-11 10:09:08,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1043670.0, ans=0.125 2024-08-11 10:09:11,746 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 10:09:12,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1043670.0, ans=0.0 2024-08-11 10:09:22,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1043770.0, ans=0.125 2024-08-11 10:09:26,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1043770.0, ans=0.125 2024-08-11 10:09:42,036 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 2950, loss[loss=0.1175, beats_loss=0.008545, ecapa_loss=0.000253, whisper_loss=0.1064, over 21174.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01141, ecapa_loss=0.0002, whisper_loss=0.09238, over 3874747.67 frames. ], batch size: 88, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:09:42,237 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-11 10:09:45,976 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+01 2.608e+01 2.908e+01 3.326e+01 5.190e+01, threshold=5.815e+01, percent-clipped=0.0 2024-08-11 10:09:52,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1043970.0, ans=0.125 2024-08-11 10:10:11,300 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 10:10:15,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1044170.0, ans=0.0 2024-08-11 10:10:23,751 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=15.0 2024-08-11 10:10:37,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1044370.0, ans=0.0 2024-08-11 10:10:38,008 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.688e+00 2024-08-11 10:10:47,947 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3000, loss[loss=0.1151, beats_loss=0.01135, ecapa_loss=0.0001855, whisper_loss=0.1019, over 18527.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01145, ecapa_loss=0.0001993, whisper_loss=0.09241, over 3860240.06 frames. ], batch size: 73, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:10:47,948 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 10:11:27,160 INFO [train_multi_KD3.py:1149] (2/4) Epoch 8, validation on ASR_libri: loss=0.2573, beats_loss=0, ecapa_loss=0.0006456, whisper_loss=0.2509, over 922467.00 frames. 2024-08-11 10:11:45,380 INFO [train_multi_KD3.py:1149] (2/4) Epoch 8, validation on SV_voxceleb1: loss=0.005368, beats_loss=0, ecapa_loss=0.0005368, whisper_loss=0, over 939242.00 frames. 2024-08-11 10:13:42,458 INFO [train_multi_KD3.py:1149] (2/4) Epoch 8, validation on AT_audioset: loss=0.02512, beats_loss=0.02512, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 10:13:42,468 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 10:13:54,814 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 10:13:56,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1044570.0, ans=0.2 2024-08-11 10:14:04,990 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.65 vs. limit=8.0 2024-08-11 10:14:07,587 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.26 vs. limit=15.0 2024-08-11 10:14:28,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1044770.0, ans=0.1 2024-08-11 10:14:32,914 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=12.0 2024-08-11 10:14:42,592 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2024-08-11 10:14:44,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1044870.0, ans=0.0 2024-08-11 10:14:49,862 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3050, loss[loss=0.1095, beats_loss=0.01314, ecapa_loss=0.0001899, whisper_loss=0.09446, over 22041.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0114, ecapa_loss=0.0001993, whisper_loss=0.09308, over 3905791.44 frames. ], batch size: 88, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:14:50,049 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 13 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 10:14:51,480 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-11 10:14:53,656 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.751e+01 3.093e+01 3.441e+01 4.563e+01, threshold=6.185e+01, percent-clipped=0.0 2024-08-11 10:14:58,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1044970.0, ans=0.1 2024-08-11 10:15:09,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1045070.0, ans=0.125 2024-08-11 10:15:09,571 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.75 vs. limit=12.0 2024-08-11 10:15:16,402 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 31 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 10:15:32,045 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.47 vs. limit=15.0 2024-08-11 10:15:32,860 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 10:15:41,490 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.74 vs. limit=15.0 2024-08-11 10:15:43,368 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-11 10:15:56,551 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3100, loss[loss=0.1306, beats_loss=0.008806, ecapa_loss=0.0002564, whisper_loss=0.1192, over 21979.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01141, ecapa_loss=0.0002004, whisper_loss=0.09243, over 3893296.07 frames. ], batch size: 88, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:16:04,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1045470.0, ans=0.0 2024-08-11 10:16:05,357 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=22.5 2024-08-11 10:16:15,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1045570.0, ans=0.1 2024-08-11 10:16:16,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1045570.0, ans=0.0 2024-08-11 10:16:18,480 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.50 vs. limit=12.0 2024-08-11 10:16:23,165 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 10:16:36,444 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 10:16:39,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1045770.0, ans=0.0 2024-08-11 10:16:43,434 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.089e+00 2024-08-11 10:16:52,791 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 10:17:00,458 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0893079861998558, model_norm_threshold=61.852699279785156 2024-08-11 10:17:00,617 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.98, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.682e+05, grad_sumsq=5.221e+04, orig_rms_sq=8.968e+00 2024-08-11 10:17:03,148 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3150, loss[loss=0.1308, beats_loss=0.009622, ecapa_loss=0.0001923, whisper_loss=0.1192, over 24027.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01141, ecapa_loss=0.0002016, whisper_loss=0.09257, over 3878597.18 frames. ], batch size: 92, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:17:07,429 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.851e+01 3.278e+01 3.632e+01 6.926e+02, threshold=6.555e+01, percent-clipped=1.0 2024-08-11 10:17:08,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.86 vs. limit=15.0 2024-08-11 10:17:20,955 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 10:17:22,115 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 10:17:32,426 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.60 vs. limit=22.5 2024-08-11 10:17:33,754 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2024-08-11 10:17:41,537 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-11 10:17:42,318 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 20 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 10:17:51,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1046270.0, ans=0.2 2024-08-11 10:18:03,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1046370.0, ans=0.2 2024-08-11 10:18:09,716 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3200, loss[loss=0.1093, beats_loss=0.01214, ecapa_loss=0.0002169, whisper_loss=0.09495, over 13828.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01128, ecapa_loss=0.0002023, whisper_loss=0.09416, over 3903983.65 frames. ], batch size: 56, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:18:15,850 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 10:18:23,271 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 10:18:23,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1046570.0, ans=0.1 2024-08-11 10:18:52,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1046770.0, ans=0.07 2024-08-11 10:18:56,475 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 10:18:57,758 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 10:19:00,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1046770.0, ans=0.1 2024-08-11 10:19:13,065 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.49 vs. limit=22.5 2024-08-11 10:19:14,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1046870.0, ans=0.125 2024-08-11 10:19:16,380 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3250, loss[loss=0.1092, beats_loss=0.0109, ecapa_loss=0.0001928, whisper_loss=0.09641, over 19376.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01122, ecapa_loss=0.0002035, whisper_loss=0.09468, over 3898524.79 frames. ], batch size: 77, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:19:20,671 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.129e+01 2.734e+01 3.207e+01 3.832e+01 6.451e+01, threshold=6.414e+01, percent-clipped=0.0 2024-08-11 10:19:22,006 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 10:19:55,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1047270.0, ans=0.0 2024-08-11 10:19:58,167 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-08-11 10:20:04,261 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 10:20:22,951 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3300, loss[loss=0.1189, beats_loss=0.009386, ecapa_loss=0.0002334, whisper_loss=0.1071, over 14869.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01124, ecapa_loss=0.000205, whisper_loss=0.09522, over 3901070.66 frames. ], batch size: 61, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:20:28,495 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 10:20:43,402 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 10:20:44,890 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 10:20:50,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1047670.0, ans=0.125 2024-08-11 10:20:54,350 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 27 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-11 10:20:58,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1047670.0, ans=0.2 2024-08-11 10:21:28,318 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.62 vs. limit=10.0 2024-08-11 10:21:30,252 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3350, loss[loss=0.1128, beats_loss=0.0106, ecapa_loss=0.0002213, whisper_loss=0.09999, over 18585.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01125, ecapa_loss=0.0002042, whisper_loss=0.09484, over 3882105.57 frames. ], batch size: 74, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:21:30,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1047970.0, ans=0.0 2024-08-11 10:21:32,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1047970.0, ans=0.125 2024-08-11 10:21:34,474 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.287e+01 2.767e+01 3.123e+01 3.740e+01 5.333e+01, threshold=6.246e+01, percent-clipped=0.0 2024-08-11 10:21:35,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1047970.0, ans=0.125 2024-08-11 10:21:40,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1047970.0, ans=0.0 2024-08-11 10:21:41,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=15.0 2024-08-11 10:22:02,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1048170.0, ans=0.0 2024-08-11 10:22:09,120 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 10:22:21,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1048270.0, ans=0.125 2024-08-11 10:22:28,012 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-11 10:22:37,342 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3400, loss[loss=0.106, beats_loss=0.01218, ecapa_loss=0.0001907, whisper_loss=0.09191, over 19923.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01132, ecapa_loss=0.0002033, whisper_loss=0.0949, over 3877386.15 frames. ], batch size: 81, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:22:48,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1048470.0, ans=0.125 2024-08-11 10:23:03,481 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 10:23:06,097 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 10:23:20,238 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 18 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 10:23:32,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1048870.0, ans=0.1 2024-08-11 10:23:38,459 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 10:23:41,833 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.70 vs. limit=22.5 2024-08-11 10:23:46,472 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3450, loss[loss=0.1238, beats_loss=0.009259, ecapa_loss=0.0001955, whisper_loss=0.1126, over 16648.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01133, ecapa_loss=0.0002033, whisper_loss=0.09481, over 3894843.42 frames. ], batch size: 62, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:23:50,566 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.572e+01 2.937e+01 3.389e+01 1.105e+02, threshold=5.874e+01, percent-clipped=1.0 2024-08-11 10:23:58,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1048970.0, ans=0.1 2024-08-11 10:24:04,818 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 10:24:05,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1049070.0, ans=0.125 2024-08-11 10:24:26,824 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 10:24:29,392 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 10:24:36,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1049270.0, ans=0.04949747468305833 2024-08-11 10:24:48,530 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 10:24:54,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1049470.0, ans=0.125 2024-08-11 10:24:55,051 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3500, loss[loss=0.1262, beats_loss=0.009645, ecapa_loss=0.0002395, whisper_loss=0.1142, over 23048.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01136, ecapa_loss=0.0002035, whisper_loss=0.09474, over 3901059.51 frames. ], batch size: 93, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:24:56,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1049470.0, ans=0.2 2024-08-11 10:25:07,700 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 21 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-11 10:25:07,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1049570.0, ans=0.0 2024-08-11 10:25:10,315 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 10:25:15,368 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 10:25:26,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1049670.0, ans=0.125 2024-08-11 10:25:32,067 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 10:25:43,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1049770.0, ans=0.125 2024-08-11 10:26:02,899 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3550, loss[loss=0.09151, beats_loss=0.01164, ecapa_loss=0.0001725, whisper_loss=0.07814, over 16702.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.0114, ecapa_loss=0.0002026, whisper_loss=0.09372, over 3918973.63 frames. ], batch size: 66, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:26:07,188 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.679e+01 2.987e+01 3.672e+01 5.992e+01, threshold=5.975e+01, percent-clipped=1.0 2024-08-11 10:26:22,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1050070.0, ans=0.125 2024-08-11 10:26:23,822 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-11 10:26:39,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1050170.0, ans=0.0 2024-08-11 10:26:53,734 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.45 vs. limit=22.5 2024-08-11 10:26:54,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1050270.0, ans=0.1 2024-08-11 10:27:03,062 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 10:27:07,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1050370.0, ans=0.125 2024-08-11 10:27:08,754 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-11 10:27:10,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1050370.0, ans=0.125 2024-08-11 10:27:12,666 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3600, loss[loss=0.1218, beats_loss=0.01146, ecapa_loss=0.0001886, whisper_loss=0.1084, over 22581.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01129, ecapa_loss=0.0002015, whisper_loss=0.09479, over 3942231.90 frames. ], batch size: 88, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:28:22,181 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3650, loss[loss=0.0949, beats_loss=0.009039, ecapa_loss=0.0002373, whisper_loss=0.08349, over 16265.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01129, ecapa_loss=0.000202, whisper_loss=0.09447, over 3903614.35 frames. ], batch size: 66, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:28:26,701 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.681e+01 3.041e+01 3.404e+01 5.123e+01, threshold=6.083e+01, percent-clipped=0.0 2024-08-11 10:28:48,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1051070.0, ans=0.07 2024-08-11 10:29:02,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1051170.0, ans=0.125 2024-08-11 10:29:26,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1051370.0, ans=0.0 2024-08-11 10:29:29,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1051370.0, ans=0.07 2024-08-11 10:29:33,273 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3700, loss[loss=0.1017, beats_loss=0.01211, ecapa_loss=0.000215, whisper_loss=0.08744, over 19548.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01133, ecapa_loss=0.0002005, whisper_loss=0.09429, over 3912267.55 frames. ], batch size: 83, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:29:35,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1051470.0, ans=0.125 2024-08-11 10:29:45,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1051470.0, ans=0.125 2024-08-11 10:30:04,467 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 10:30:10,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1051670.0, ans=0.125 2024-08-11 10:30:14,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1051670.0, ans=0.125 2024-08-11 10:30:29,947 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 10:30:33,791 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 10:30:45,299 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3750, loss[loss=0.09532, beats_loss=0.01314, ecapa_loss=0.0001747, whisper_loss=0.08043, over 16616.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01144, ecapa_loss=0.0001999, whisper_loss=0.09352, over 3885061.63 frames. ], batch size: 63, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:30:49,524 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.786e+01 3.057e+01 3.501e+01 5.299e+01, threshold=6.113e+01, percent-clipped=0.0 2024-08-11 10:31:19,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1052170.0, ans=0.125 2024-08-11 10:31:55,520 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3800, loss[loss=0.1021, beats_loss=0.01136, ecapa_loss=0.0002291, whisper_loss=0.08847, over 16756.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.0115, ecapa_loss=0.0001998, whisper_loss=0.09336, over 3895973.82 frames. ], batch size: 69, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:32:00,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1052470.0, ans=0.1 2024-08-11 10:32:00,813 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.21 vs. limit=15.0 2024-08-11 10:32:08,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1052570.0, ans=0.1 2024-08-11 10:32:15,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1052570.0, ans=0.125 2024-08-11 10:32:24,920 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 10:32:32,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1052670.0, ans=0.0 2024-08-11 10:32:46,609 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 10:32:51,132 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 10:32:55,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1052870.0, ans=0.125 2024-08-11 10:32:56,068 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.95 vs. limit=22.5 2024-08-11 10:33:06,620 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3850, loss[loss=0.1284, beats_loss=0.008778, ecapa_loss=0.0002186, whisper_loss=0.1174, over 18659.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.0115, ecapa_loss=0.0001986, whisper_loss=0.09344, over 3874294.22 frames. ], batch size: 72, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:33:09,626 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 10:33:09,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1052970.0, ans=0.125 2024-08-11 10:33:10,675 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.130e+01 2.764e+01 3.232e+01 3.837e+01 5.936e+01, threshold=6.465e+01, percent-clipped=0.0 2024-08-11 10:33:14,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1052970.0, ans=0.1 2024-08-11 10:33:18,000 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 10:33:28,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=1053070.0, ans=0.1 2024-08-11 10:33:36,292 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 28 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 10:33:44,743 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 10:33:51,917 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.22 vs. limit=15.0 2024-08-11 10:33:54,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1053270.0, ans=0.09899494936611666 2024-08-11 10:33:55,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1053270.0, ans=0.125 2024-08-11 10:33:56,035 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2024-08-11 10:34:16,332 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3900, loss[loss=0.1089, beats_loss=0.01021, ecapa_loss=0.0002047, whisper_loss=0.09662, over 19132.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01134, ecapa_loss=0.0002006, whisper_loss=0.09426, over 3893187.23 frames. ], batch size: 75, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:34:23,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1053470.0, ans=0.125 2024-08-11 10:34:24,029 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2024-08-11 10:34:39,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1053570.0, ans=0.125 2024-08-11 10:34:40,478 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 10:34:44,726 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.61 vs. limit=22.5 2024-08-11 10:35:14,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1053870.0, ans=0.1 2024-08-11 10:35:21,886 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 10:35:26,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1053970.0, ans=0.125 2024-08-11 10:35:27,685 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 3950, loss[loss=0.09656, beats_loss=0.01098, ecapa_loss=0.0001917, whisper_loss=0.08367, over 16862.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01131, ecapa_loss=0.0002024, whisper_loss=0.09459, over 3903855.01 frames. ], batch size: 64, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:35:28,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1053970.0, ans=0.2 2024-08-11 10:35:30,202 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.97 vs. limit=10.0 2024-08-11 10:35:32,094 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+01 2.817e+01 3.170e+01 3.825e+01 1.516e+02, threshold=6.340e+01, percent-clipped=2.0 2024-08-11 10:35:47,605 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 10:35:55,398 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 10:36:01,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1054170.0, ans=0.125 2024-08-11 10:36:16,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1054270.0, ans=0.0 2024-08-11 10:36:29,608 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 10:36:33,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1054370.0, ans=0.0 2024-08-11 10:36:39,148 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-11 10:36:42,816 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4000, loss[loss=0.0913, beats_loss=0.01241, ecapa_loss=0.0001969, whisper_loss=0.07692, over 21965.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01131, ecapa_loss=0.0002016, whisper_loss=0.09423, over 3899253.96 frames. ], batch size: 92, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:37:00,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1054570.0, ans=0.5 2024-08-11 10:37:01,529 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 10:37:04,473 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 10:37:15,984 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 10:37:16,947 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=15.0 2024-08-11 10:37:18,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1054670.0, ans=0.125 2024-08-11 10:37:18,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1054670.0, ans=0.125 2024-08-11 10:37:44,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.73 vs. limit=15.0 2024-08-11 10:37:58,490 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4050, loss[loss=0.1018, beats_loss=0.01365, ecapa_loss=0.0001823, whisper_loss=0.08637, over 22309.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01129, ecapa_loss=0.0002018, whisper_loss=0.09427, over 3900208.87 frames. ], batch size: 91, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:38:03,810 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.646e+01 2.921e+01 3.336e+01 5.282e+01, threshold=5.841e+01, percent-clipped=0.0 2024-08-11 10:38:11,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1054970.0, ans=0.0 2024-08-11 10:38:16,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1055070.0, ans=0.125 2024-08-11 10:38:20,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1055070.0, ans=0.0 2024-08-11 10:38:21,105 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 10:38:25,864 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-11 10:38:33,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1055170.0, ans=0.2 2024-08-11 10:38:46,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1055270.0, ans=0.125 2024-08-11 10:38:50,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1055270.0, ans=0.2 2024-08-11 10:38:51,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1055270.0, ans=0.125 2024-08-11 10:38:53,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1055270.0, ans=0.125 2024-08-11 10:38:53,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1055270.0, ans=0.125 2024-08-11 10:39:12,253 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.563e-03 2024-08-11 10:39:15,081 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4100, loss[loss=0.06911, beats_loss=0.01352, ecapa_loss=0.0001819, whisper_loss=0.05378, over 19261.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01133, ecapa_loss=0.0002013, whisper_loss=0.09384, over 3899357.26 frames. ], batch size: 79, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:39:27,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1055470.0, ans=0.125 2024-08-11 10:39:31,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1055570.0, ans=0.1 2024-08-11 10:40:05,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1055770.0, ans=0.0 2024-08-11 10:40:06,635 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 10:40:09,902 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 10:40:19,290 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 10:40:28,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1055870.0, ans=0.125 2024-08-11 10:40:28,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1055870.0, ans=0.1 2024-08-11 10:40:34,085 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4150, loss[loss=0.1063, beats_loss=0.01202, ecapa_loss=0.0002796, whisper_loss=0.09152, over 21830.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01138, ecapa_loss=0.0002016, whisper_loss=0.09392, over 3883388.65 frames. ], batch size: 94, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:40:38,418 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.691e+01 3.023e+01 3.383e+01 1.135e+02, threshold=6.046e+01, percent-clipped=2.0 2024-08-11 10:40:39,988 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 10:40:40,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1055970.0, ans=0.1 2024-08-11 10:40:46,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1055970.0, ans=0.2 2024-08-11 10:40:47,832 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-11 10:40:57,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1056070.0, ans=0.125 2024-08-11 10:41:02,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1056070.0, ans=0.125 2024-08-11 10:41:12,870 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 10:41:14,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1056170.0, ans=0.07 2024-08-11 10:41:16,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1056170.0, ans=0.2 2024-08-11 10:41:17,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1056270.0, ans=0.125 2024-08-11 10:41:27,352 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 10:41:40,088 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.14 vs. limit=22.5 2024-08-11 10:41:45,162 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 10:41:48,024 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4200, loss[loss=0.09419, beats_loss=0.0132, ecapa_loss=0.0001987, whisper_loss=0.079, over 20235.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01138, ecapa_loss=0.0002005, whisper_loss=0.09421, over 3900013.62 frames. ], batch size: 86, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:41:50,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1056470.0, ans=0.0 2024-08-11 10:42:00,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1056470.0, ans=0.0 2024-08-11 10:42:06,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1056570.0, ans=0.125 2024-08-11 10:42:13,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1056570.0, ans=10.0 2024-08-11 10:42:18,174 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 10:42:23,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1056670.0, ans=0.125 2024-08-11 10:42:49,924 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 37 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 10:43:02,847 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4250, loss[loss=0.1038, beats_loss=0.009008, ecapa_loss=0.0002332, whisper_loss=0.09242, over 20253.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01145, ecapa_loss=0.0002012, whisper_loss=0.09376, over 3928765.80 frames. ], batch size: 78, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:43:07,330 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.666e+01 2.925e+01 3.281e+01 5.407e+01, threshold=5.850e+01, percent-clipped=0.0 2024-08-11 10:43:07,516 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 15 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 10:43:20,246 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 10:43:26,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1057070.0, ans=0.025 2024-08-11 10:43:30,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1057170.0, ans=0.0 2024-08-11 10:43:36,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1057170.0, ans=0.2 2024-08-11 10:43:55,468 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 10:44:07,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1057370.0, ans=0.125 2024-08-11 10:44:08,825 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 10:44:16,389 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4300, loss[loss=0.08187, beats_loss=0.01214, ecapa_loss=0.0001547, whisper_loss=0.06818, over 13891.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01142, ecapa_loss=0.0001994, whisper_loss=0.09312, over 3884074.42 frames. ], batch size: 54, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:44:27,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1057470.0, ans=0.025 2024-08-11 10:44:32,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1057570.0, ans=0.1 2024-08-11 10:44:57,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1057670.0, ans=0.2 2024-08-11 10:45:00,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1057670.0, ans=0.1 2024-08-11 10:45:17,154 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.31 vs. limit=10.0 2024-08-11 10:45:34,058 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4350, loss[loss=0.08776, beats_loss=0.01239, ecapa_loss=0.0001683, whisper_loss=0.07369, over 21212.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01139, ecapa_loss=0.0001989, whisper_loss=0.09276, over 3876196.70 frames. ], batch size: 86, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:45:37,092 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 27 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 10:45:38,580 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.592e+01 2.860e+01 3.306e+01 4.790e+01, threshold=5.719e+01, percent-clipped=0.0 2024-08-11 10:45:47,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1058070.0, ans=0.2 2024-08-11 10:46:16,532 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-11 10:46:30,809 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 10:46:47,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1058370.0, ans=0.125 2024-08-11 10:46:51,163 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4400, loss[loss=0.1258, beats_loss=0.01069, ecapa_loss=0.0001927, whisper_loss=0.1132, over 23956.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01145, ecapa_loss=0.0001982, whisper_loss=0.09307, over 3873218.15 frames. ], batch size: 93, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:46:55,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1058470.0, ans=0.0 2024-08-11 10:46:56,396 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 10:46:56,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1058470.0, ans=0.125 2024-08-11 10:47:05,815 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-11 10:47:18,495 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=6.598e-02 2024-08-11 10:47:27,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1058670.0, ans=0.125 2024-08-11 10:47:58,995 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-11 10:48:10,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1058870.0, ans=10.0 2024-08-11 10:48:13,487 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4450, loss[loss=0.08967, beats_loss=0.01561, ecapa_loss=0.0001732, whisper_loss=0.07233, over 22337.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01145, ecapa_loss=0.0001979, whisper_loss=0.09298, over 3872008.30 frames. ], batch size: 90, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:48:17,634 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.805e+01 3.007e+01 3.333e+01 6.979e+01, threshold=6.014e+01, percent-clipped=1.0 2024-08-11 10:48:22,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1058970.0, ans=0.1 2024-08-11 10:48:29,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1059070.0, ans=0.0 2024-08-11 10:48:38,579 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-11 10:48:50,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1059170.0, ans=0.1 2024-08-11 10:49:01,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1059270.0, ans=0.0 2024-08-11 10:49:06,212 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 10:49:10,024 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-08-11 10:49:20,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1059370.0, ans=0.125 2024-08-11 10:49:21,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1059370.0, ans=0.0 2024-08-11 10:49:29,631 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4500, loss[loss=0.1017, beats_loss=0.01133, ecapa_loss=0.0002447, whisper_loss=0.08794, over 20104.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01137, ecapa_loss=0.000198, whisper_loss=0.09283, over 3869056.69 frames. ], batch size: 84, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:50:08,300 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 10:50:08,610 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=9.508e-01 2024-08-11 10:50:10,331 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 10:50:24,121 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 31 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 10:50:28,119 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-11 10:50:41,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1059870.0, ans=0.125 2024-08-11 10:50:44,120 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4550, loss[loss=0.1152, beats_loss=0.009897, ecapa_loss=0.0002157, whisper_loss=0.1032, over 23117.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01136, ecapa_loss=0.0001998, whisper_loss=0.09262, over 3886427.25 frames. ], batch size: 92, lr: 7.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:50:48,935 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.557e+01 2.865e+01 3.375e+01 6.211e+01, threshold=5.730e+01, percent-clipped=1.0 2024-08-11 10:50:49,807 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2024-08-11 10:50:55,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1059970.0, ans=0.1 2024-08-11 10:50:58,523 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 10:50:58,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1060070.0, ans=0.2 2024-08-11 10:51:00,635 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.42 vs. limit=10.0 2024-08-11 10:51:16,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1060170.0, ans=0.125 2024-08-11 10:51:19,689 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.11 vs. limit=15.0 2024-08-11 10:51:25,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1060170.0, ans=0.125 2024-08-11 10:51:35,157 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 10:51:54,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1060370.0, ans=0.125 2024-08-11 10:51:56,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2024-08-11 10:51:57,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1060370.0, ans=0.1 2024-08-11 10:52:00,230 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4600, loss[loss=0.1304, beats_loss=0.01051, ecapa_loss=0.0001862, whisper_loss=0.118, over 20131.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01129, ecapa_loss=0.0002009, whisper_loss=0.09326, over 3917834.97 frames. ], batch size: 78, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:52:00,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1060470.0, ans=0.125 2024-08-11 10:52:05,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1060470.0, ans=0.125 2024-08-11 10:52:07,821 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=12.0 2024-08-11 10:52:10,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1060470.0, ans=0.1 2024-08-11 10:52:14,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1060570.0, ans=0.1 2024-08-11 10:52:18,507 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 10:52:38,846 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 10:52:55,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1060770.0, ans=0.0 2024-08-11 10:53:02,137 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.64 vs. limit=10.0 2024-08-11 10:53:09,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1060870.0, ans=0.035 2024-08-11 10:53:12,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1060870.0, ans=0.0 2024-08-11 10:53:14,213 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 10:53:21,147 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4650, loss[loss=0.09868, beats_loss=0.008306, ecapa_loss=0.0002793, whisper_loss=0.08759, over 18763.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.0113, ecapa_loss=0.0001998, whisper_loss=0.09275, over 3930612.86 frames. ], batch size: 78, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:53:26,043 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.723e+01 3.113e+01 3.495e+01 7.663e+01, threshold=6.226e+01, percent-clipped=1.0 2024-08-11 10:53:47,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1061070.0, ans=0.0 2024-08-11 10:53:54,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1061170.0, ans=0.0 2024-08-11 10:53:58,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1061170.0, ans=0.1 2024-08-11 10:54:01,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1061170.0, ans=0.125 2024-08-11 10:54:22,136 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 10:54:24,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.96 vs. limit=22.5 2024-08-11 10:54:28,551 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 10:54:28,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1061370.0, ans=0.025 2024-08-11 10:54:42,665 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4700, loss[loss=0.09111, beats_loss=0.01179, ecapa_loss=0.0001977, whisper_loss=0.07734, over 18693.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01127, ecapa_loss=0.0002009, whisper_loss=0.09359, over 3919994.41 frames. ], batch size: 76, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:54:43,743 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-08-11 10:54:46,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1061470.0, ans=0.125 2024-08-11 10:54:52,880 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 10:54:58,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1061570.0, ans=0.0 2024-08-11 10:54:59,962 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 10:55:22,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1061670.0, ans=0.0 2024-08-11 10:55:22,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1061670.0, ans=0.2 2024-08-11 10:55:23,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1061670.0, ans=0.0 2024-08-11 10:55:40,170 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.25 vs. limit=10.0 2024-08-11 10:56:05,588 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4750, loss[loss=0.1218, beats_loss=0.009928, ecapa_loss=0.0002046, whisper_loss=0.1098, over 22814.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01131, ecapa_loss=0.0002008, whisper_loss=0.0933, over 3919002.28 frames. ], batch size: 88, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:56:10,313 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.290e+01 2.759e+01 3.104e+01 3.569e+01 5.241e+01, threshold=6.207e+01, percent-clipped=0.0 2024-08-11 10:56:16,277 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-11 10:56:19,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1061970.0, ans=0.2 2024-08-11 10:56:25,765 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 10:56:46,425 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 10:56:53,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1062170.0, ans=0.125 2024-08-11 10:56:53,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1062170.0, ans=0.0 2024-08-11 10:57:06,144 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 10:57:15,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1062370.0, ans=0.125 2024-08-11 10:57:31,659 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4800, loss[loss=0.1249, beats_loss=0.01159, ecapa_loss=0.0001998, whisper_loss=0.1113, over 23131.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01136, ecapa_loss=0.0002005, whisper_loss=0.09338, over 3920112.31 frames. ], batch size: 91, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:57:37,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1062470.0, ans=0.0 2024-08-11 10:57:38,188 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 10:57:45,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1062470.0, ans=0.0 2024-08-11 10:57:49,299 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 10:57:51,540 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=61.63 vs. limit=22.5 2024-08-11 10:57:54,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1062570.0, ans=0.07 2024-08-11 10:58:10,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1062670.0, ans=0.125 2024-08-11 10:58:18,266 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.86 vs. limit=22.5 2024-08-11 10:58:19,421 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-11 10:58:26,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1062770.0, ans=0.2 2024-08-11 10:58:36,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1062770.0, ans=0.0 2024-08-11 10:58:38,286 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 10:58:41,330 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 10:58:50,071 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 10:58:54,504 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 10:58:55,812 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4850, loss[loss=0.106, beats_loss=0.01263, ecapa_loss=0.0001891, whisper_loss=0.0915, over 18101.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01136, ecapa_loss=0.0002013, whisper_loss=0.09351, over 3909502.19 frames. ], batch size: 75, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:59:00,358 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.634e+01 3.190e+01 3.671e+01 5.547e+01, threshold=6.379e+01, percent-clipped=0.0 2024-08-11 10:59:33,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1063170.0, ans=0.125 2024-08-11 10:59:33,701 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2024-08-11 10:59:54,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1063270.0, ans=0.0 2024-08-11 11:00:15,114 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4900, loss[loss=0.1183, beats_loss=0.01039, ecapa_loss=0.0001757, whisper_loss=0.1061, over 23970.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01136, ecapa_loss=0.0001997, whisper_loss=0.09373, over 3894347.61 frames. ], batch size: 92, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:00:22,182 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-11 11:00:43,706 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 11:00:44,962 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 11:00:52,721 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 11:01:04,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1063770.0, ans=0.1 2024-08-11 11:01:09,265 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 10 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 11:01:09,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1063770.0, ans=0.125 2024-08-11 11:01:19,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1063870.0, ans=0.125 2024-08-11 11:01:22,286 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 11:01:37,397 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 4950, loss[loss=0.09769, beats_loss=0.01059, ecapa_loss=0.0002867, whisper_loss=0.08423, over 19860.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01138, ecapa_loss=0.0001993, whisper_loss=0.09271, over 3862425.98 frames. ], batch size: 89, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:01:43,721 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.682e+01 3.010e+01 3.354e+01 5.437e+01, threshold=6.020e+01, percent-clipped=0.0 2024-08-11 11:01:49,970 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.133e-01 2024-08-11 11:01:54,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1064070.0, ans=0.0 2024-08-11 11:02:03,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1064070.0, ans=0.125 2024-08-11 11:02:04,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1064070.0, ans=0.0 2024-08-11 11:02:20,304 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 11:02:20,923 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.44 vs. limit=15.0 2024-08-11 11:02:27,817 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 11:02:39,950 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 11:03:00,532 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5000, loss[loss=0.1156, beats_loss=0.01022, ecapa_loss=0.0002275, whisper_loss=0.1031, over 16611.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01138, ecapa_loss=0.0002002, whisper_loss=0.09276, over 3845716.53 frames. ], batch size: 66, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:03:02,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1064470.0, ans=0.0 2024-08-11 11:03:05,272 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 11:03:10,150 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 31 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-11 11:03:13,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1064470.0, ans=0.2 2024-08-11 11:03:35,019 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-11 11:03:39,952 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-11 11:03:43,245 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 11:03:45,318 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-11 11:03:56,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1064770.0, ans=0.2 2024-08-11 11:03:58,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1064770.0, ans=0.0 2024-08-11 11:03:58,847 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.96 vs. limit=6.0 2024-08-11 11:04:06,750 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 11:04:15,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1064870.0, ans=0.125 2024-08-11 11:04:24,848 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5050, loss[loss=0.1011, beats_loss=0.01314, ecapa_loss=0.0001595, whisper_loss=0.08636, over 18274.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01143, ecapa_loss=0.0001992, whisper_loss=0.09284, over 3851437.46 frames. ], batch size: 72, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:04:30,023 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.593e+01 2.899e+01 3.463e+01 4.526e+01, threshold=5.797e+01, percent-clipped=0.0 2024-08-11 11:04:50,975 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 18 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 11:05:04,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1065170.0, ans=0.0 2024-08-11 11:05:08,411 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 11:05:47,641 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.23 vs. limit=22.5 2024-08-11 11:05:50,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1065370.0, ans=0.0 2024-08-11 11:05:52,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1065470.0, ans=0.125 2024-08-11 11:05:53,652 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5100, loss[loss=0.09677, beats_loss=0.01464, ecapa_loss=0.0001366, whisper_loss=0.08076, over 20837.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01146, ecapa_loss=0.0001972, whisper_loss=0.09261, over 3851689.52 frames. ], batch size: 81, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:06:20,224 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 11:06:38,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1065670.0, ans=0.0 2024-08-11 11:06:46,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1065770.0, ans=0.2 2024-08-11 11:06:54,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1065770.0, ans=0.125 2024-08-11 11:06:57,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1065770.0, ans=0.1 2024-08-11 11:06:58,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1065770.0, ans=0.1 2024-08-11 11:07:16,619 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5150, loss[loss=0.104, beats_loss=0.01383, ecapa_loss=0.0001458, whisper_loss=0.08875, over 22218.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01142, ecapa_loss=0.0001959, whisper_loss=0.0934, over 3873331.75 frames. ], batch size: 87, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:07:22,841 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.743e+01 3.078e+01 3.597e+01 5.105e+01, threshold=6.156e+01, percent-clipped=0.0 2024-08-11 11:07:34,243 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-11 11:07:35,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1066070.0, ans=0.5 2024-08-11 11:08:13,094 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-08-11 11:08:15,307 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 11:08:24,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1066370.0, ans=0.2 2024-08-11 11:08:33,315 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5200, loss[loss=0.09623, beats_loss=0.01281, ecapa_loss=0.0001791, whisper_loss=0.08162, over 17159.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01146, ecapa_loss=0.0001957, whisper_loss=0.0931, over 3872303.33 frames. ], batch size: 68, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:08:45,017 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 11:08:45,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1066470.0, ans=0.0 2024-08-11 11:08:47,052 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 30 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 11:09:14,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1066670.0, ans=0.1 2024-08-11 11:09:16,216 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-11 11:09:32,038 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 11:09:42,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1066870.0, ans=0.0 2024-08-11 11:09:52,010 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5250, loss[loss=0.1033, beats_loss=0.01545, ecapa_loss=0.0001509, whisper_loss=0.08636, over 22929.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0115, ecapa_loss=0.0001969, whisper_loss=0.09257, over 3855966.55 frames. ], batch size: 91, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:09:54,303 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 8 from Vox, 39 fro AS 2024-08-11 11:09:57,099 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.555e+01 2.975e+01 3.407e+01 4.666e+01, threshold=5.951e+01, percent-clipped=0.0 2024-08-11 11:10:01,569 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 11:10:25,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1067170.0, ans=0.2 2024-08-11 11:10:29,234 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-11 11:10:43,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1067270.0, ans=0.2 2024-08-11 11:10:47,776 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 11:10:48,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1067270.0, ans=0.125 2024-08-11 11:10:53,681 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 11:10:58,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1067370.0, ans=0.125 2024-08-11 11:11:07,548 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-11 11:11:08,122 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.26 vs. limit=15.0 2024-08-11 11:11:10,738 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5300, loss[loss=0.1217, beats_loss=0.01141, ecapa_loss=0.0002172, whisper_loss=0.1081, over 22080.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.0114, ecapa_loss=0.0001987, whisper_loss=0.09351, over 3857663.03 frames. ], batch size: 89, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:11:18,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1067470.0, ans=0.125 2024-08-11 11:11:24,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1067470.0, ans=0.125 2024-08-11 11:11:44,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1067670.0, ans=0.125 2024-08-11 11:11:48,293 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 11:11:54,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1067770.0, ans=0.125 2024-08-11 11:12:00,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1067770.0, ans=0.05 2024-08-11 11:12:19,185 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-11 11:12:29,402 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5350, loss[loss=0.1054, beats_loss=0.01199, ecapa_loss=0.0001831, whisper_loss=0.09158, over 21478.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01133, ecapa_loss=0.0001975, whisper_loss=0.09346, over 3856313.93 frames. ], batch size: 85, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:12:36,274 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.785e+01 3.077e+01 3.493e+01 6.327e+01, threshold=6.155e+01, percent-clipped=1.0 2024-08-11 11:13:07,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1068070.0, ans=0.05 2024-08-11 11:13:25,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1068170.0, ans=0.0 2024-08-11 11:13:44,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1068270.0, ans=0.2 2024-08-11 11:14:15,128 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5400, loss[loss=0.09766, beats_loss=0.0136, ecapa_loss=0.0001729, whisper_loss=0.08233, over 21203.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01136, ecapa_loss=0.0001976, whisper_loss=0.09356, over 3855604.65 frames. ], batch size: 87, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:14:15,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1068470.0, ans=0.125 2024-08-11 11:14:32,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1068570.0, ans=0.125 2024-08-11 11:14:42,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1068570.0, ans=0.125 2024-08-11 11:14:45,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1068570.0, ans=0.125 2024-08-11 11:15:19,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1068770.0, ans=0.0 2024-08-11 11:15:50,922 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5450, loss[loss=0.09806, beats_loss=0.01326, ecapa_loss=0.0001534, whisper_loss=0.08327, over 14068.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01134, ecapa_loss=0.0001972, whisper_loss=0.09396, over 3877517.05 frames. ], batch size: 54, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:15:57,394 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.164e+01 2.869e+01 3.117e+01 3.592e+01 6.207e+01, threshold=6.234e+01, percent-clipped=1.0 2024-08-11 11:16:03,373 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2024-08-11 11:16:47,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1069170.0, ans=0.0 2024-08-11 11:16:48,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1069170.0, ans=0.0 2024-08-11 11:16:52,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1069270.0, ans=0.1 2024-08-11 11:16:54,865 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-11 11:17:31,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1069370.0, ans=0.125 2024-08-11 11:17:35,749 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5500, loss[loss=0.1121, beats_loss=0.01257, ecapa_loss=0.0001829, whisper_loss=0.09772, over 16343.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01128, ecapa_loss=0.000198, whisper_loss=0.09391, over 3883078.23 frames. ], batch size: 63, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:17:49,844 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2024-08-11 11:18:04,167 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.33 vs. limit=15.0 2024-08-11 11:18:05,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1069570.0, ans=0.0 2024-08-11 11:18:55,565 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.32 vs. limit=15.0 2024-08-11 11:19:22,111 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5550, loss[loss=0.1122, beats_loss=0.01149, ecapa_loss=0.0001956, whisper_loss=0.09874, over 22969.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.0112, ecapa_loss=0.0002, whisper_loss=0.09445, over 3874641.83 frames. ], batch size: 92, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:19:28,685 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.609e+01 2.954e+01 3.474e+01 6.484e+01, threshold=5.909e+01, percent-clipped=2.0 2024-08-11 11:19:46,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1070070.0, ans=0.125 2024-08-11 11:19:59,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1070070.0, ans=0.0 2024-08-11 11:20:00,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1070070.0, ans=0.1 2024-08-11 11:20:21,730 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 11:20:23,274 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 11:20:54,023 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5600, loss[loss=0.1223, beats_loss=0.01159, ecapa_loss=0.0001695, whisper_loss=0.109, over 23967.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01128, ecapa_loss=0.0002002, whisper_loss=0.09409, over 3906241.06 frames. ], batch size: 91, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:20:54,223 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-11 11:21:04,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1070470.0, ans=0.125 2024-08-11 11:21:20,006 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.37 vs. limit=15.0 2024-08-11 11:21:25,295 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-11 11:21:32,358 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 11:21:32,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1070670.0, ans=0.1 2024-08-11 11:21:47,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.87 vs. limit=10.0 2024-08-11 11:22:03,948 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-11 11:22:07,179 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5650, loss[loss=0.09609, beats_loss=0.01114, ecapa_loss=0.0001719, whisper_loss=0.08323, over 18195.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01135, ecapa_loss=0.0001991, whisper_loss=0.0934, over 3920387.27 frames. ], batch size: 71, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:22:11,663 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.577e+01 2.929e+01 3.455e+01 8.964e+01, threshold=5.859e+01, percent-clipped=1.0 2024-08-11 11:22:20,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1071070.0, ans=0.125 2024-08-11 11:22:21,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1071070.0, ans=0.125 2024-08-11 11:22:33,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1071070.0, ans=0.2 2024-08-11 11:22:38,813 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.06 vs. limit=15.0 2024-08-11 11:22:43,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1071170.0, ans=0.0 2024-08-11 11:22:48,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1071170.0, ans=0.1 2024-08-11 11:23:13,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1071370.0, ans=0.1 2024-08-11 11:23:25,668 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5700, loss[loss=0.1042, beats_loss=0.01005, ecapa_loss=0.000165, whisper_loss=0.09253, over 14391.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01131, ecapa_loss=0.000201, whisper_loss=0.09345, over 3910746.85 frames. ], batch size: 53, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:23:45,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1071570.0, ans=0.125 2024-08-11 11:23:54,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1071670.0, ans=0.125 2024-08-11 11:23:56,686 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=12.0 2024-08-11 11:23:57,458 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.432e-03 2024-08-11 11:24:05,784 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 11:24:09,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1071770.0, ans=0.125 2024-08-11 11:24:11,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1071770.0, ans=0.2 2024-08-11 11:24:18,725 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 21 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-11 11:24:26,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1071870.0, ans=0.025 2024-08-11 11:24:41,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1071870.0, ans=0.09899494936611666 2024-08-11 11:24:43,799 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5750, loss[loss=0.1147, beats_loss=0.011, ecapa_loss=0.0001901, whisper_loss=0.1018, over 22671.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01117, ecapa_loss=0.0002018, whisper_loss=0.09445, over 3899656.90 frames. ], batch size: 88, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:24:48,527 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.718e+01 3.107e+01 3.541e+01 5.804e+01, threshold=6.214e+01, percent-clipped=0.0 2024-08-11 11:24:53,615 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=12.0 2024-08-11 11:25:00,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1072070.0, ans=0.125 2024-08-11 11:25:02,634 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 11:25:05,676 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 11:25:10,081 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.51 vs. limit=22.5 2024-08-11 11:25:24,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1072170.0, ans=0.0 2024-08-11 11:25:24,940 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2024-08-11 11:25:42,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1072270.0, ans=0.125 2024-08-11 11:25:47,129 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.33 vs. limit=22.5 2024-08-11 11:25:54,142 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 11:26:02,980 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5800, loss[loss=0.1109, beats_loss=0.009955, ecapa_loss=0.0002563, whisper_loss=0.09836, over 18190.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01118, ecapa_loss=0.0002036, whisper_loss=0.09426, over 3897545.25 frames. ], batch size: 76, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:26:08,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1072470.0, ans=0.0 2024-08-11 11:26:11,635 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 14 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-11 11:26:15,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1072470.0, ans=0.0 2024-08-11 11:26:25,003 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-11 11:26:30,432 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 11:26:34,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1072670.0, ans=0.1 2024-08-11 11:26:35,955 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 11:26:42,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1072670.0, ans=0.125 2024-08-11 11:26:46,488 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 11:27:02,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1072870.0, ans=0.0 2024-08-11 11:27:08,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1072870.0, ans=0.125 2024-08-11 11:27:13,604 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.772e-01 2024-08-11 11:27:18,626 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5850, loss[loss=0.1111, beats_loss=0.009552, ecapa_loss=0.0002057, whisper_loss=0.0995, over 14044.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01129, ecapa_loss=0.000203, whisper_loss=0.09365, over 3868390.54 frames. ], batch size: 54, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:27:19,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1072970.0, ans=0.125 2024-08-11 11:27:22,891 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.902e+02 2024-08-11 11:27:23,675 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.774e+01 3.139e+01 3.627e+01 6.860e+01, threshold=6.277e+01, percent-clipped=1.0 2024-08-11 11:27:32,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1073070.0, ans=0.125 2024-08-11 11:27:34,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1073070.0, ans=0.0 2024-08-11 11:27:35,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1073070.0, ans=0.125 2024-08-11 11:27:41,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1073070.0, ans=0.2 2024-08-11 11:27:42,410 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 11:27:48,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1073170.0, ans=0.125 2024-08-11 11:28:06,491 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2024-08-11 11:28:19,403 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 17 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-11 11:28:19,913 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.50 vs. limit=15.0 2024-08-11 11:28:26,678 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=22.5 2024-08-11 11:28:31,157 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5900, loss[loss=0.1027, beats_loss=0.01145, ecapa_loss=0.0002221, whisper_loss=0.08905, over 20534.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01137, ecapa_loss=0.0002021, whisper_loss=0.09317, over 3850482.02 frames. ], batch size: 86, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:28:57,520 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-11 11:29:12,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1073670.0, ans=0.125 2024-08-11 11:29:15,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1073770.0, ans=0.125 2024-08-11 11:29:20,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1073770.0, ans=0.125 2024-08-11 11:29:25,366 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-11 11:29:25,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1073770.0, ans=0.1 2024-08-11 11:29:42,452 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 5950, loss[loss=0.109, beats_loss=0.009215, ecapa_loss=0.0002388, whisper_loss=0.09741, over 21606.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01138, ecapa_loss=0.0002022, whisper_loss=0.09287, over 3879767.87 frames. ], batch size: 89, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:29:47,387 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.700e+01 3.029e+01 3.647e+01 6.302e+01, threshold=6.057e+01, percent-clipped=1.0 2024-08-11 11:30:08,457 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.52 vs. limit=22.5 2024-08-11 11:30:09,059 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 11:30:26,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1074270.0, ans=0.0 2024-08-11 11:30:30,807 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 25 from LS+wenet, 16 from Vox, 15 fro AS 2024-08-11 11:30:37,773 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 11:30:38,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1074270.0, ans=0.1 2024-08-11 11:30:47,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1074370.0, ans=0.2 2024-08-11 11:30:56,104 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6000, loss[loss=0.1151, beats_loss=0.009624, ecapa_loss=0.0001996, whisper_loss=0.1035, over 22377.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01139, ecapa_loss=0.0002016, whisper_loss=0.09306, over 3879820.79 frames. ], batch size: 88, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:30:56,104 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 11:31:34,837 INFO [train_multi_KD3.py:1149] (2/4) Epoch 8, validation on ASR_libri: loss=0.2586, beats_loss=0, ecapa_loss=0.0006404, whisper_loss=0.2522, over 922467.00 frames. 2024-08-11 11:31:52,409 INFO [train_multi_KD3.py:1149] (2/4) Epoch 8, validation on SV_voxceleb1: loss=0.005252, beats_loss=0, ecapa_loss=0.0005252, whisper_loss=0, over 939242.00 frames. 2024-08-11 11:33:45,281 INFO [train_multi_KD3.py:1149] (2/4) Epoch 8, validation on AT_audioset: loss=0.02539, beats_loss=0.02539, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 11:33:45,285 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 11:34:15,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=1074670.0, ans=0.05 2024-08-11 11:34:17,074 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.64 vs. limit=22.5 2024-08-11 11:34:34,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1074770.0, ans=0.125 2024-08-11 11:34:42,352 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 11:34:56,942 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2024-08-11 11:34:59,004 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6050, loss[loss=0.1055, beats_loss=0.01286, ecapa_loss=0.0002104, whisper_loss=0.09053, over 15517.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01136, ecapa_loss=0.0002013, whisper_loss=0.09313, over 3887186.38 frames. ], batch size: 65, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:35:03,613 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.279e+01 2.749e+01 3.055e+01 3.427e+01 5.083e+01, threshold=6.111e+01, percent-clipped=0.0 2024-08-11 11:35:10,192 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=15.0 2024-08-11 11:35:23,265 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 11:35:29,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1075170.0, ans=0.0 2024-08-11 11:35:31,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1075170.0, ans=0.2 2024-08-11 11:35:48,616 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 11:35:49,348 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-08-11 11:36:04,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1075370.0, ans=0.1 2024-08-11 11:36:14,011 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6100, loss[loss=0.09094, beats_loss=0.01148, ecapa_loss=0.0001922, whisper_loss=0.07754, over 15278.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01134, ecapa_loss=0.0002018, whisper_loss=0.09344, over 3899808.99 frames. ], batch size: 60, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:36:18,886 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 11:36:27,293 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2024-08-11 11:36:29,625 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 11:36:30,174 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2024-08-11 11:36:58,958 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 13 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 11:37:12,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1075770.0, ans=0.0 2024-08-11 11:37:18,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1075870.0, ans=0.0 2024-08-11 11:37:21,331 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 11:37:30,020 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6150, loss[loss=0.08456, beats_loss=0.01579, ecapa_loss=0.0001776, whisper_loss=0.06699, over 21770.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01131, ecapa_loss=0.000201, whisper_loss=0.09391, over 3913446.60 frames. ], batch size: 91, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:37:34,413 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.684e+01 3.005e+01 3.339e+01 4.754e+01, threshold=6.009e+01, percent-clipped=0.0 2024-08-11 11:37:37,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1075970.0, ans=0.1 2024-08-11 11:37:40,991 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 35 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 11:37:42,539 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 11:38:20,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1076270.0, ans=0.125 2024-08-11 11:38:24,813 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 11:38:43,409 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6200, loss[loss=0.09568, beats_loss=0.01268, ecapa_loss=0.000198, whisper_loss=0.08101, over 18645.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01137, ecapa_loss=0.0002002, whisper_loss=0.093, over 3901319.56 frames. ], batch size: 75, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:38:46,734 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 11:38:46,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1076470.0, ans=0.125 2024-08-11 11:38:50,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1076470.0, ans=0.05 2024-08-11 11:39:04,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1076570.0, ans=0.125 2024-08-11 11:39:04,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1076570.0, ans=0.0 2024-08-11 11:39:21,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1076670.0, ans=0.125 2024-08-11 11:39:30,184 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 11:39:59,975 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6250, loss[loss=0.1088, beats_loss=0.01107, ecapa_loss=0.0002052, whisper_loss=0.09567, over 15116.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01135, ecapa_loss=0.0002016, whisper_loss=0.09274, over 3891107.46 frames. ], batch size: 59, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:40:04,195 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.210e+01 2.830e+01 2.972e+01 3.439e+01 5.876e+01, threshold=5.945e+01, percent-clipped=0.0 2024-08-11 11:40:19,113 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 11:40:31,840 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 19 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-11 11:40:44,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1077270.0, ans=0.125 2024-08-11 11:40:45,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1077270.0, ans=0.125 2024-08-11 11:41:04,537 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.44 vs. limit=10.0 2024-08-11 11:41:12,790 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-11 11:41:13,127 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6300, loss[loss=0.1204, beats_loss=0.00901, ecapa_loss=0.0002983, whisper_loss=0.1084, over 20675.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01134, ecapa_loss=0.0002009, whisper_loss=0.09346, over 3896599.30 frames. ], batch size: 88, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:41:13,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1077470.0, ans=0.125 2024-08-11 11:41:25,729 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 11:41:39,069 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.96 vs. limit=15.0 2024-08-11 11:41:43,142 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 11:41:50,346 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 11:42:00,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1077770.0, ans=0.125 2024-08-11 11:42:22,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1077870.0, ans=0.0 2024-08-11 11:42:23,818 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 18 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-11 11:42:24,894 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6350, loss[loss=0.083, beats_loss=0.01353, ecapa_loss=0.0002111, whisper_loss=0.06735, over 20120.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01128, ecapa_loss=0.0002025, whisper_loss=0.09349, over 3876998.30 frames. ], batch size: 84, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:42:27,950 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-11 11:42:28,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1077970.0, ans=0.125 2024-08-11 11:42:29,314 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.640e+01 2.866e+01 3.160e+01 1.102e+02, threshold=5.732e+01, percent-clipped=1.0 2024-08-11 11:42:33,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1077970.0, ans=0.0 2024-08-11 11:42:53,543 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.23 vs. limit=15.0 2024-08-11 11:43:12,218 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-11 11:43:15,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1078270.0, ans=0.0 2024-08-11 11:43:15,926 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.32 vs. limit=15.0 2024-08-11 11:43:20,469 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-11 11:43:20,674 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=8.892e-01 2024-08-11 11:43:24,725 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 11:43:32,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1078370.0, ans=0.5 2024-08-11 11:43:37,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1078370.0, ans=0.125 2024-08-11 11:43:39,566 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6400, loss[loss=0.09528, beats_loss=0.01149, ecapa_loss=0.0001837, whisper_loss=0.08196, over 14808.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01136, ecapa_loss=0.0002011, whisper_loss=0.09303, over 3871148.35 frames. ], batch size: 54, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:43:47,462 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 11:43:54,683 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.94 vs. limit=10.0 2024-08-11 11:43:59,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1078570.0, ans=0.0 2024-08-11 11:44:00,996 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 30 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 11:44:07,569 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-11 11:44:15,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1078670.0, ans=0.125 2024-08-11 11:44:20,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1078670.0, ans=0.2 2024-08-11 11:44:26,446 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2024-08-11 11:44:27,027 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 38 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 11:44:56,006 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6450, loss[loss=0.09807, beats_loss=0.01224, ecapa_loss=0.0001929, whisper_loss=0.0839, over 21787.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.0114, ecapa_loss=0.0002007, whisper_loss=0.09289, over 3868151.76 frames. ], batch size: 92, lr: 7.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:45:00,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1078970.0, ans=0.125 2024-08-11 11:45:01,156 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+01 2.754e+01 3.078e+01 3.674e+01 5.893e+01, threshold=6.156e+01, percent-clipped=1.0 2024-08-11 11:45:03,497 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.354e-01 2024-08-11 11:45:05,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1078970.0, ans=0.015 2024-08-11 11:45:06,319 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.16 vs. limit=15.0 2024-08-11 11:45:08,420 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-11 11:45:11,193 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 11:45:32,432 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-11 11:45:54,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1079370.0, ans=0.1 2024-08-11 11:45:57,148 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.08 vs. limit=6.0 2024-08-11 11:46:08,891 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6500, loss[loss=0.1227, beats_loss=0.01143, ecapa_loss=0.0001887, whisper_loss=0.1094, over 19143.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01142, ecapa_loss=0.0002003, whisper_loss=0.09355, over 3882290.75 frames. ], batch size: 76, lr: 7.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:46:19,233 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 11:46:21,944 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 11:46:31,347 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 11:46:36,333 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-11 11:47:08,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1079870.0, ans=0.1 2024-08-11 11:47:20,290 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6550, loss[loss=0.1151, beats_loss=0.008058, ecapa_loss=0.0002379, whisper_loss=0.1047, over 20450.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01135, ecapa_loss=0.0001992, whisper_loss=0.09403, over 3887402.54 frames. ], batch size: 81, lr: 7.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:47:22,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1079970.0, ans=0.125 2024-08-11 11:47:27,879 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+01 2.781e+01 3.122e+01 3.450e+01 5.322e+01, threshold=6.243e+01, percent-clipped=0.0 2024-08-11 11:47:32,549 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.26 vs. limit=6.0 2024-08-11 11:47:33,508 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-08-11 11:47:34,717 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 11:47:40,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1080070.0, ans=0.125 2024-08-11 11:47:40,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1080070.0, ans=0.125 2024-08-11 11:47:51,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1080070.0, ans=0.2 2024-08-11 11:48:18,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1080270.0, ans=0.0 2024-08-11 11:48:22,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1080370.0, ans=10.0 2024-08-11 11:48:23,569 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 11:48:37,008 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6600, loss[loss=0.1173, beats_loss=0.009344, ecapa_loss=0.000265, whisper_loss=0.1053, over 21138.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01138, ecapa_loss=0.0002001, whisper_loss=0.09421, over 3911931.11 frames. ], batch size: 90, lr: 7.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:48:46,877 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 11:49:01,538 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.75 vs. limit=15.0 2024-08-11 11:49:05,722 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.38 vs. limit=22.5 2024-08-11 11:49:16,375 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 11:49:22,338 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 11:49:23,751 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=15.0 2024-08-11 11:49:26,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1080770.0, ans=0.125 2024-08-11 11:49:49,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1080970.0, ans=0.2 2024-08-11 11:49:50,398 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6650, loss[loss=0.1163, beats_loss=0.007869, ecapa_loss=0.0002149, whisper_loss=0.1063, over 17712.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01141, ecapa_loss=0.0002001, whisper_loss=0.09422, over 3905588.86 frames. ], batch size: 68, lr: 7.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:49:52,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1080970.0, ans=0.125 2024-08-11 11:49:54,562 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.681e+01 2.981e+01 3.448e+01 5.241e+01, threshold=5.962e+01, percent-clipped=0.0 2024-08-11 11:49:57,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1080970.0, ans=0.125 2024-08-11 11:50:02,112 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 11:50:04,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1081070.0, ans=0.125 2024-08-11 11:50:07,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1081070.0, ans=0.07 2024-08-11 11:50:15,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1081070.0, ans=10.0 2024-08-11 11:50:23,008 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-11 11:50:52,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1081370.0, ans=0.2 2024-08-11 11:51:01,671 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6700, loss[loss=0.06487, beats_loss=0.01553, ecapa_loss=0.0001856, whisper_loss=0.04748, over 15025.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01135, ecapa_loss=0.0002023, whisper_loss=0.09413, over 3912073.58 frames. ], batch size: 61, lr: 7.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:51:36,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1081670.0, ans=0.0 2024-08-11 11:51:42,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1081670.0, ans=0.125 2024-08-11 11:52:02,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1081870.0, ans=0.0 2024-08-11 11:52:03,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1081870.0, ans=0.0 2024-08-11 11:52:08,053 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 11:52:09,797 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2024-08-11 11:52:12,106 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 11:52:14,841 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6750, loss[loss=0.1256, beats_loss=0.007679, ecapa_loss=0.0002331, whisper_loss=0.1156, over 18941.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01135, ecapa_loss=0.0002014, whisper_loss=0.094, over 3898486.60 frames. ], batch size: 75, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:52:18,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1081970.0, ans=0.125 2024-08-11 11:52:18,902 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.942e+01 3.557e+01 4.197e+01 2.407e+02, threshold=7.114e+01, percent-clipped=7.0 2024-08-11 11:52:22,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1081970.0, ans=0.125 2024-08-11 11:52:26,978 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.27 vs. limit=15.0 2024-08-11 11:52:38,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1082070.0, ans=0.0 2024-08-11 11:52:41,100 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 16 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 11:52:53,998 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 11:53:08,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1082270.0, ans=0.125 2024-08-11 11:53:27,017 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6800, loss[loss=0.08675, beats_loss=0.01415, ecapa_loss=0.0002035, whisper_loss=0.07056, over 21489.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01128, ecapa_loss=0.0002021, whisper_loss=0.09391, over 3903382.75 frames. ], batch size: 93, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:53:37,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1082470.0, ans=0.0 2024-08-11 11:53:42,563 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.74 vs. limit=12.0 2024-08-11 11:53:48,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1082570.0, ans=0.0 2024-08-11 11:54:00,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1082670.0, ans=0.125 2024-08-11 11:54:10,223 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 11:54:10,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1082770.0, ans=0.125 2024-08-11 11:54:15,829 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-11 11:54:23,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1082770.0, ans=0.0 2024-08-11 11:54:26,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1082870.0, ans=0.0 2024-08-11 11:54:26,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1082870.0, ans=0.2 2024-08-11 11:54:39,985 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6850, loss[loss=0.1045, beats_loss=0.01279, ecapa_loss=0.0001333, whisper_loss=0.09041, over 19307.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0113, ecapa_loss=0.0002013, whisper_loss=0.09411, over 3900792.29 frames. ], batch size: 74, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:54:42,366 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2024-08-11 11:54:44,202 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.694e+01 2.999e+01 3.363e+01 5.238e+01, threshold=5.998e+01, percent-clipped=0.0 2024-08-11 11:54:54,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1083070.0, ans=0.1 2024-08-11 11:55:14,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1083170.0, ans=0.125 2024-08-11 11:55:26,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1083270.0, ans=0.1 2024-08-11 11:55:31,494 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 11:55:44,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1083370.0, ans=0.04949747468305833 2024-08-11 11:55:44,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1083370.0, ans=0.0 2024-08-11 11:55:49,782 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6900, loss[loss=0.105, beats_loss=0.0124, ecapa_loss=0.0001826, whisper_loss=0.09073, over 17894.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01134, ecapa_loss=0.0002017, whisper_loss=0.09384, over 3896019.17 frames. ], batch size: 70, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:55:51,245 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 25 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-11 11:55:55,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1083470.0, ans=0.125 2024-08-11 11:56:09,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1083570.0, ans=0.1 2024-08-11 11:56:12,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1083570.0, ans=0.1 2024-08-11 11:56:15,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1083570.0, ans=0.125 2024-08-11 11:56:30,314 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.37 vs. limit=15.0 2024-08-11 11:56:33,403 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 11:56:57,365 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 6950, loss[loss=0.1269, beats_loss=0.009387, ecapa_loss=0.0001737, whisper_loss=0.1157, over 21214.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01137, ecapa_loss=0.0002009, whisper_loss=0.09369, over 3889937.47 frames. ], batch size: 78, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:56:57,560 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 11:57:01,683 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.167e+01 2.671e+01 2.938e+01 3.749e+01 5.482e+01, threshold=5.876e+01, percent-clipped=0.0 2024-08-11 11:57:01,843 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 11:57:05,489 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.70 vs. limit=8.0 2024-08-11 11:57:10,248 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 11:57:12,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1084070.0, ans=0.0 2024-08-11 11:57:17,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1084070.0, ans=0.1 2024-08-11 11:57:20,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1084070.0, ans=0.1 2024-08-11 11:57:40,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1084270.0, ans=15.0 2024-08-11 11:57:49,144 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.18 vs. limit=6.0 2024-08-11 11:58:04,511 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7000, loss[loss=0.1158, beats_loss=0.009923, ecapa_loss=0.0002231, whisper_loss=0.1036, over 21337.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01136, ecapa_loss=0.0002006, whisper_loss=0.09404, over 3890267.68 frames. ], batch size: 88, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:58:06,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1084470.0, ans=0.1 2024-08-11 11:58:31,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1084670.0, ans=0.0 2024-08-11 11:58:41,441 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 11:58:50,949 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 11:58:52,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1084770.0, ans=0.125 2024-08-11 11:58:55,986 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 11:59:06,152 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.03 vs. limit=6.0 2024-08-11 11:59:08,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1084870.0, ans=0.1 2024-08-11 11:59:11,984 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7050, loss[loss=0.1122, beats_loss=0.009061, ecapa_loss=0.0001644, whisper_loss=0.1015, over 15344.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01132, ecapa_loss=0.0002009, whisper_loss=0.09359, over 3841365.60 frames. ], batch size: 58, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:59:12,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1084970.0, ans=0.125 2024-08-11 11:59:15,913 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.647e+01 2.921e+01 3.539e+01 5.654e+01, threshold=5.842e+01, percent-clipped=0.0 2024-08-11 11:59:17,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1084970.0, ans=0.2 2024-08-11 11:59:19,139 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.066e+00 2024-08-11 11:59:21,458 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 11:59:29,986 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=9.141e-01 2024-08-11 11:59:41,552 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 11:59:43,031 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 11:59:56,501 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 12:00:04,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1085370.0, ans=0.04949747468305833 2024-08-11 12:00:08,022 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.06 vs. limit=15.0 2024-08-11 12:00:10,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1085370.0, ans=0.125 2024-08-11 12:00:11,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1085370.0, ans=0.1 2024-08-11 12:00:19,312 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7100, loss[loss=0.08318, beats_loss=0.008851, ecapa_loss=0.0002473, whisper_loss=0.07186, over 15605.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01135, ecapa_loss=0.0002007, whisper_loss=0.09355, over 3847578.20 frames. ], batch size: 64, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:00:55,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1085670.0, ans=0.1 2024-08-11 12:00:59,225 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 12:01:20,942 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.77 vs. limit=22.5 2024-08-11 12:01:23,206 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 12:01:25,616 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7150, loss[loss=0.09735, beats_loss=0.01429, ecapa_loss=0.000166, whisper_loss=0.08139, over 21458.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01135, ecapa_loss=0.0001989, whisper_loss=0.09302, over 3816258.89 frames. ], batch size: 87, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:01:28,486 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 12:01:29,707 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+01 2.825e+01 3.133e+01 3.530e+01 6.975e+01, threshold=6.267e+01, percent-clipped=1.0 2024-08-11 12:01:30,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1085970.0, ans=0.125 2024-08-11 12:01:36,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1085970.0, ans=0.125 2024-08-11 12:01:37,783 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 12:01:48,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1086070.0, ans=0.0 2024-08-11 12:02:23,401 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 12:02:28,889 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 12:02:32,426 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7200, loss[loss=0.1008, beats_loss=0.01117, ecapa_loss=0.0002415, whisper_loss=0.08722, over 17035.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01137, ecapa_loss=0.0001987, whisper_loss=0.09371, over 3845744.40 frames. ], batch size: 70, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:02:35,236 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 12:03:19,352 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.36 vs. limit=15.0 2024-08-11 12:03:38,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1086870.0, ans=0.0 2024-08-11 12:03:39,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1086970.0, ans=0.0 2024-08-11 12:03:40,449 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7250, loss[loss=0.1, beats_loss=0.009785, ecapa_loss=0.0002501, whisper_loss=0.08775, over 20297.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01133, ecapa_loss=0.0001991, whisper_loss=0.09434, over 3874143.44 frames. ], batch size: 83, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:03:44,516 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+01 2.767e+01 3.129e+01 3.597e+01 6.037e+01, threshold=6.257e+01, percent-clipped=0.0 2024-08-11 12:04:02,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.86 vs. limit=15.0 2024-08-11 12:04:32,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1087370.0, ans=0.1 2024-08-11 12:04:35,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1087370.0, ans=0.09899494936611666 2024-08-11 12:04:40,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1087370.0, ans=0.1 2024-08-11 12:04:47,596 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7300, loss[loss=0.09661, beats_loss=0.01285, ecapa_loss=0.0002303, whisper_loss=0.08145, over 20714.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01129, ecapa_loss=0.0002007, whisper_loss=0.09483, over 3889294.75 frames. ], batch size: 84, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:04:54,458 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 12:05:08,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1087570.0, ans=0.2 2024-08-11 12:05:12,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1087570.0, ans=0.2 2024-08-11 12:05:22,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1087670.0, ans=0.125 2024-08-11 12:05:24,519 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-11 12:05:29,537 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-11 12:05:38,576 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-11 12:05:46,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1087870.0, ans=0.125 2024-08-11 12:05:55,849 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7350, loss[loss=0.1033, beats_loss=0.0097, ecapa_loss=0.0002328, whisper_loss=0.09129, over 19371.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01132, ecapa_loss=0.0002016, whisper_loss=0.0944, over 3891458.61 frames. ], batch size: 78, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:05:58,482 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 12:05:59,598 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.629e+01 2.975e+01 3.413e+01 5.829e+01, threshold=5.951e+01, percent-clipped=0.0 2024-08-11 12:06:00,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1087970.0, ans=0.0 2024-08-11 12:06:04,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1087970.0, ans=0.0 2024-08-11 12:06:44,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1088270.0, ans=0.2 2024-08-11 12:07:02,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=1088470.0, ans=0.1 2024-08-11 12:07:03,670 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7400, loss[loss=0.112, beats_loss=0.009448, ecapa_loss=0.0001973, whisper_loss=0.1006, over 15780.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01139, ecapa_loss=0.0001995, whisper_loss=0.09345, over 3882191.79 frames. ], batch size: 59, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:07:05,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1088470.0, ans=0.125 2024-08-11 12:07:24,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1088570.0, ans=0.125 2024-08-11 12:07:45,511 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2024-08-11 12:07:46,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1088770.0, ans=0.125 2024-08-11 12:07:58,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1088870.0, ans=0.0 2024-08-11 12:07:59,957 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 12:08:03,948 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 12:08:10,134 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7450, loss[loss=0.09078, beats_loss=0.01199, ecapa_loss=0.0001995, whisper_loss=0.0768, over 15780.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01132, ecapa_loss=0.0002004, whisper_loss=0.09407, over 3890894.14 frames. ], batch size: 64, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:08:14,030 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.716e+01 3.101e+01 3.669e+01 6.917e+01, threshold=6.202e+01, percent-clipped=1.0 2024-08-11 12:09:00,139 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.33 vs. limit=15.0 2024-08-11 12:09:21,133 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7500, loss[loss=0.115, beats_loss=0.01153, ecapa_loss=0.0001967, whisper_loss=0.1015, over 22420.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01136, ecapa_loss=0.0002008, whisper_loss=0.09354, over 3913869.80 frames. ], batch size: 92, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:09:34,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1089570.0, ans=0.125 2024-08-11 12:09:39,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1089570.0, ans=0.125 2024-08-11 12:09:44,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1089570.0, ans=0.0 2024-08-11 12:09:44,932 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-11 12:10:02,088 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 12:10:05,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1089770.0, ans=0.125 2024-08-11 12:10:09,017 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 12:10:10,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1089770.0, ans=0.125 2024-08-11 12:10:25,598 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 12:10:28,831 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-11 12:10:32,553 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7550, loss[loss=0.08209, beats_loss=0.01419, ecapa_loss=0.0001959, whisper_loss=0.06594, over 20958.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01139, ecapa_loss=0.000201, whisper_loss=0.09325, over 3866512.22 frames. ], batch size: 88, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:10:36,606 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.255e+01 2.654e+01 2.939e+01 3.334e+01 5.450e+01, threshold=5.879e+01, percent-clipped=0.0 2024-08-11 12:10:38,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1089970.0, ans=0.125 2024-08-11 12:10:39,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1089970.0, ans=0.125 2024-08-11 12:10:42,626 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 12:10:51,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1090070.0, ans=0.125 2024-08-11 12:11:05,737 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.42 vs. limit=15.0 2024-08-11 12:11:20,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1090270.0, ans=0.025 2024-08-11 12:11:26,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1090270.0, ans=0.125 2024-08-11 12:11:37,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1090370.0, ans=0.0 2024-08-11 12:11:41,037 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 12:11:44,018 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7600, loss[loss=0.1031, beats_loss=0.01011, ecapa_loss=0.0001595, whisper_loss=0.09143, over 18601.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01137, ecapa_loss=0.0002013, whisper_loss=0.09278, over 3843808.60 frames. ], batch size: 68, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:11:50,484 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=15.0 2024-08-11 12:12:12,428 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.68 vs. limit=15.0 2024-08-11 12:12:35,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1090770.0, ans=0.125 2024-08-11 12:12:43,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1090870.0, ans=0.1 2024-08-11 12:12:52,359 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7650, loss[loss=0.09729, beats_loss=0.0124, ecapa_loss=0.0001853, whisper_loss=0.08303, over 15666.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01131, ecapa_loss=0.0002002, whisper_loss=0.09326, over 3849042.27 frames. ], batch size: 61, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:12:56,611 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+01 2.822e+01 3.132e+01 3.571e+01 5.523e+01, threshold=6.263e+01, percent-clipped=0.0 2024-08-11 12:13:09,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1091070.0, ans=0.1 2024-08-11 12:13:21,660 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.354e+00 2024-08-11 12:13:59,604 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7700, loss[loss=0.1336, beats_loss=0.007847, ecapa_loss=0.0002529, whisper_loss=0.1232, over 15543.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0113, ecapa_loss=0.0002006, whisper_loss=0.09322, over 3819485.37 frames. ], batch size: 59, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:14:01,861 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2024-08-11 12:14:02,589 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-11 12:14:09,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1091470.0, ans=0.05 2024-08-11 12:14:40,905 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 12:14:51,682 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 12:15:02,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1091870.0, ans=0.125 2024-08-11 12:15:05,933 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7750, loss[loss=0.09398, beats_loss=0.01396, ecapa_loss=0.0001838, whisper_loss=0.07818, over 21976.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01133, ecapa_loss=0.0002014, whisper_loss=0.09226, over 3844692.37 frames. ], batch size: 93, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:15:10,015 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.105e+01 2.756e+01 3.140e+01 3.838e+01 1.235e+02, threshold=6.279e+01, percent-clipped=2.0 2024-08-11 12:15:17,954 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 12:15:24,249 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 12:15:26,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1092070.0, ans=0.125 2024-08-11 12:15:33,273 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-11 12:15:39,896 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 12:15:43,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1092270.0, ans=0.125 2024-08-11 12:15:47,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1092270.0, ans=0.125 2024-08-11 12:15:52,974 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-11 12:16:02,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1092370.0, ans=0.0 2024-08-11 12:16:07,078 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 12:16:09,795 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 12:16:10,835 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7800, loss[loss=0.111, beats_loss=0.009214, ecapa_loss=0.000277, whisper_loss=0.09905, over 17530.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01136, ecapa_loss=0.0001991, whisper_loss=0.09267, over 3847392.60 frames. ], batch size: 73, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:16:15,268 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 12:16:23,951 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2024-08-11 12:16:27,291 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 12:16:40,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1092670.0, ans=0.0 2024-08-11 12:16:47,705 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2024-08-11 12:17:12,898 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.98 vs. limit=6.0 2024-08-11 12:17:17,587 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7850, loss[loss=0.1084, beats_loss=0.01106, ecapa_loss=0.0001981, whisper_loss=0.09539, over 16622.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01132, ecapa_loss=0.0001989, whisper_loss=0.0931, over 3855245.32 frames. ], batch size: 66, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:17:21,542 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.734e+01 3.036e+01 3.446e+01 5.621e+01, threshold=6.073e+01, percent-clipped=0.0 2024-08-11 12:17:32,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1093070.0, ans=0.0 2024-08-11 12:17:40,710 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 12:18:05,949 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-11 12:18:09,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1093270.0, ans=0.04949747468305833 2024-08-11 12:18:24,588 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7900, loss[loss=0.1145, beats_loss=0.00962, ecapa_loss=0.0002136, whisper_loss=0.1027, over 19895.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01143, ecapa_loss=0.0001982, whisper_loss=0.09258, over 3854496.14 frames. ], batch size: 75, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:18:28,839 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-11 12:18:41,967 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 12:18:52,563 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.957e+02 2024-08-11 12:19:03,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1093770.0, ans=15.0 2024-08-11 12:19:08,524 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 37 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 12:19:11,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1093770.0, ans=0.0 2024-08-11 12:19:13,526 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 12:19:13,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1093770.0, ans=0.0 2024-08-11 12:19:29,643 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 12:19:30,697 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 7950, loss[loss=0.09225, beats_loss=0.0135, ecapa_loss=0.0002003, whisper_loss=0.07675, over 20798.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01147, ecapa_loss=0.0001991, whisper_loss=0.09269, over 3863927.00 frames. ], batch size: 88, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:19:34,964 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.750e+01 3.082e+01 3.483e+01 5.642e+01, threshold=6.163e+01, percent-clipped=0.0 2024-08-11 12:19:35,190 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-11 12:19:39,677 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.91 vs. limit=15.0 2024-08-11 12:19:55,334 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-11 12:20:07,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1094170.0, ans=0.2 2024-08-11 12:20:11,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1094270.0, ans=0.125 2024-08-11 12:20:30,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1094370.0, ans=0.125 2024-08-11 12:20:37,154 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.89 vs. limit=15.0 2024-08-11 12:20:37,613 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8000, loss[loss=0.1139, beats_loss=0.009575, ecapa_loss=0.0002298, whisper_loss=0.1021, over 22267.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01136, ecapa_loss=0.0002003, whisper_loss=0.09302, over 3846892.24 frames. ], batch size: 91, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:20:45,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1094470.0, ans=0.125 2024-08-11 12:20:47,015 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 12:20:50,083 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-11 12:21:13,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1094670.0, ans=0.0 2024-08-11 12:21:13,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1094670.0, ans=0.0 2024-08-11 12:21:17,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1094770.0, ans=0.2 2024-08-11 12:21:24,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1094770.0, ans=0.125 2024-08-11 12:21:30,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1094870.0, ans=0.125 2024-08-11 12:21:31,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1094870.0, ans=0.125 2024-08-11 12:21:34,219 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 12:21:44,754 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8050, loss[loss=0.107, beats_loss=0.01184, ecapa_loss=0.0001684, whisper_loss=0.09344, over 18892.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01134, ecapa_loss=0.0001994, whisper_loss=0.09299, over 3864043.71 frames. ], batch size: 75, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:21:44,976 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 20 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-11 12:21:48,655 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.683e+01 3.112e+01 3.562e+01 5.362e+01, threshold=6.224e+01, percent-clipped=0.0 2024-08-11 12:22:02,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1095070.0, ans=0.125 2024-08-11 12:22:14,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1095170.0, ans=0.125 2024-08-11 12:22:40,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1095370.0, ans=0.1 2024-08-11 12:22:43,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1095370.0, ans=0.125 2024-08-11 12:22:48,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1095370.0, ans=0.1 2024-08-11 12:22:52,171 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8100, loss[loss=0.1048, beats_loss=0.01173, ecapa_loss=0.0002062, whisper_loss=0.09098, over 17545.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0114, ecapa_loss=0.0002001, whisper_loss=0.09277, over 3872596.85 frames. ], batch size: 70, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:22:54,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1095470.0, ans=0.1 2024-08-11 12:23:03,090 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 12:23:05,766 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 12:23:08,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1095570.0, ans=0.1 2024-08-11 12:23:10,755 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-11 12:23:20,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1095670.0, ans=0.125 2024-08-11 12:23:27,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1095670.0, ans=0.0 2024-08-11 12:23:50,491 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-08-11 12:23:57,905 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-11 12:23:58,960 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8150, loss[loss=0.1121, beats_loss=0.01197, ecapa_loss=0.000123, whisper_loss=0.09893, over 20396.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01131, ecapa_loss=0.0002014, whisper_loss=0.0936, over 3882856.28 frames. ], batch size: 74, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:24:03,131 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.662e+01 2.951e+01 3.382e+01 5.794e+01, threshold=5.903e+01, percent-clipped=0.0 2024-08-11 12:24:03,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1095970.0, ans=0.125 2024-08-11 12:24:14,219 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 12:24:14,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1096070.0, ans=0.125 2024-08-11 12:24:15,035 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2024-08-11 12:24:32,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1096170.0, ans=0.2 2024-08-11 12:25:06,401 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8200, loss[loss=0.09679, beats_loss=0.01209, ecapa_loss=0.0001459, whisper_loss=0.08324, over 23334.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01126, ecapa_loss=0.0002, whisper_loss=0.09403, over 3881086.31 frames. ], batch size: 91, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:25:06,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1096470.0, ans=0.0 2024-08-11 12:25:16,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1096470.0, ans=0.125 2024-08-11 12:25:24,995 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 12:25:27,583 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 12:25:27,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1096570.0, ans=0.015 2024-08-11 12:25:27,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1096570.0, ans=0.09899494936611666 2024-08-11 12:25:33,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1096670.0, ans=0.1 2024-08-11 12:25:50,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1096770.0, ans=0.125 2024-08-11 12:25:51,832 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 12:25:56,249 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.73 vs. limit=6.0 2024-08-11 12:26:05,683 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-11 12:26:11,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1096970.0, ans=0.0 2024-08-11 12:26:12,518 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8250, loss[loss=0.07861, beats_loss=0.01417, ecapa_loss=0.0001567, whisper_loss=0.06287, over 20643.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01132, ecapa_loss=0.0001981, whisper_loss=0.09347, over 3874540.92 frames. ], batch size: 88, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:26:16,304 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.782e+01 3.103e+01 3.474e+01 6.879e+01, threshold=6.206e+01, percent-clipped=1.0 2024-08-11 12:26:18,375 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 12:26:44,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1097170.0, ans=0.2 2024-08-11 12:26:52,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1097270.0, ans=0.0 2024-08-11 12:26:57,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1097270.0, ans=0.125 2024-08-11 12:27:07,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1097370.0, ans=0.0 2024-08-11 12:27:07,359 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.03 vs. limit=15.0 2024-08-11 12:27:19,896 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8300, loss[loss=0.1204, beats_loss=0.01011, ecapa_loss=0.0001989, whisper_loss=0.1083, over 23187.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01134, ecapa_loss=0.0001985, whisper_loss=0.09324, over 3869093.32 frames. ], batch size: 92, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:27:22,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1097470.0, ans=0.1 2024-08-11 12:27:24,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1097470.0, ans=0.125 2024-08-11 12:27:27,621 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 12:27:48,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1097670.0, ans=0.0 2024-08-11 12:27:49,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1097670.0, ans=0.0 2024-08-11 12:27:58,537 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-11 12:28:01,385 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 20 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-11 12:28:26,307 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8350, loss[loss=0.1263, beats_loss=0.008982, ecapa_loss=0.0002131, whisper_loss=0.1152, over 19535.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01143, ecapa_loss=0.0001972, whisper_loss=0.09282, over 3911912.16 frames. ], batch size: 76, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:28:30,521 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.714e+01 3.261e+01 3.683e+01 6.544e+01, threshold=6.523e+01, percent-clipped=1.0 2024-08-11 12:28:42,794 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 12:28:44,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1098070.0, ans=0.07 2024-08-11 12:28:49,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1098070.0, ans=0.0 2024-08-11 12:29:00,952 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.69 vs. limit=15.0 2024-08-11 12:29:01,589 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-11 12:29:08,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1098270.0, ans=0.1 2024-08-11 12:29:22,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1098370.0, ans=0.125 2024-08-11 12:29:34,014 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8400, loss[loss=0.1039, beats_loss=0.0104, ecapa_loss=0.0001982, whisper_loss=0.09156, over 21308.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01144, ecapa_loss=0.0001984, whisper_loss=0.09236, over 3895190.36 frames. ], batch size: 86, lr: 7.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:29:38,718 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.53 vs. limit=10.0 2024-08-11 12:29:40,598 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 27 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 12:30:18,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1098770.0, ans=0.0 2024-08-11 12:30:21,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1098770.0, ans=0.125 2024-08-11 12:30:22,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1098770.0, ans=0.125 2024-08-11 12:30:24,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1098770.0, ans=0.0 2024-08-11 12:30:29,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1098870.0, ans=0.125 2024-08-11 12:30:38,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1098870.0, ans=0.1 2024-08-11 12:30:40,616 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8450, loss[loss=0.1059, beats_loss=0.01051, ecapa_loss=0.0001827, whisper_loss=0.09359, over 18997.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01141, ecapa_loss=0.0002007, whisper_loss=0.09193, over 3893882.34 frames. ], batch size: 72, lr: 7.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:30:44,823 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.709e+01 3.054e+01 3.505e+01 4.740e+01, threshold=6.108e+01, percent-clipped=0.0 2024-08-11 12:30:45,027 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-11 12:30:54,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1099070.0, ans=0.0 2024-08-11 12:31:41,817 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 12:31:46,301 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-08-11 12:31:46,968 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8500, loss[loss=0.1035, beats_loss=0.0131, ecapa_loss=0.0001688, whisper_loss=0.08868, over 17139.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.0113, ecapa_loss=0.0002007, whisper_loss=0.09349, over 3892010.46 frames. ], batch size: 65, lr: 7.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:32:03,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1099570.0, ans=0.1 2024-08-11 12:32:32,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1099770.0, ans=0.0 2024-08-11 12:32:41,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1099870.0, ans=0.1 2024-08-11 12:32:56,043 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8550, loss[loss=0.1272, beats_loss=0.009984, ecapa_loss=0.0002325, whisper_loss=0.1149, over 21662.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01131, ecapa_loss=0.0002006, whisper_loss=0.09338, over 3869541.66 frames. ], batch size: 89, lr: 7.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:33:00,569 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.728e+01 3.009e+01 3.613e+01 5.860e+01, threshold=6.017e+01, percent-clipped=0.0 2024-08-11 12:33:08,986 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.17 vs. limit=22.5 2024-08-11 12:33:24,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1100170.0, ans=0.125 2024-08-11 12:34:05,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1100370.0, ans=0.015 2024-08-11 12:34:11,116 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8600, loss[loss=0.1089, beats_loss=0.01028, ecapa_loss=0.0002634, whisper_loss=0.09603, over 13305.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01119, ecapa_loss=0.0002005, whisper_loss=0.09415, over 3864715.29 frames. ], batch size: 59, lr: 7.85e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:34:18,042 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 12:34:28,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1100570.0, ans=0.2 2024-08-11 12:34:31,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1100570.0, ans=0.05 2024-08-11 12:34:39,318 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 12:34:42,004 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-11 12:34:44,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1100670.0, ans=0.125 2024-08-11 12:35:07,947 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 12:35:18,137 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 12:35:19,554 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 12:35:26,723 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8650, loss[loss=0.1075, beats_loss=0.01041, ecapa_loss=0.0001717, whisper_loss=0.09539, over 19542.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01118, ecapa_loss=0.0002019, whisper_loss=0.09387, over 3853242.94 frames. ], batch size: 72, lr: 7.85e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:35:31,151 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.631e+01 2.958e+01 3.559e+01 6.258e+01, threshold=5.915e+01, percent-clipped=1.0 2024-08-11 12:35:56,204 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2024-08-11 12:36:16,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1101270.0, ans=10.0 2024-08-11 12:36:17,234 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 12:36:20,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1101270.0, ans=0.125 2024-08-11 12:36:39,338 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 12:36:47,133 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8700, loss[loss=0.1118, beats_loss=0.008554, ecapa_loss=0.0002552, whisper_loss=0.1006, over 14297.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01129, ecapa_loss=0.0002004, whisper_loss=0.09368, over 3846966.16 frames. ], batch size: 58, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:36:47,408 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 36 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 12:36:53,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1101470.0, ans=0.07 2024-08-11 12:36:57,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1101470.0, ans=0.2 2024-08-11 12:37:24,278 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 12:37:27,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1101670.0, ans=0.5 2024-08-11 12:37:36,959 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 12:37:48,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1101770.0, ans=0.125 2024-08-11 12:38:09,666 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 12:38:11,082 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8750, loss[loss=0.09953, beats_loss=0.0114, ecapa_loss=0.0001523, whisper_loss=0.0866, over 19679.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01132, ecapa_loss=0.000201, whisper_loss=0.093, over 3864672.54 frames. ], batch size: 74, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:38:15,228 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.807e+01 3.199e+01 3.848e+01 5.840e+01, threshold=6.398e+01, percent-clipped=0.0 2024-08-11 12:38:20,570 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 12:38:35,702 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-11 12:38:52,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1102170.0, ans=0.125 2024-08-11 12:38:53,275 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-11 12:39:07,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1102270.0, ans=0.1 2024-08-11 12:39:17,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1102370.0, ans=0.1 2024-08-11 12:39:25,787 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8800, loss[loss=0.1001, beats_loss=0.01172, ecapa_loss=0.0001908, whisper_loss=0.08644, over 21196.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01136, ecapa_loss=0.0002007, whisper_loss=0.09295, over 3853739.65 frames. ], batch size: 86, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:39:28,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1102470.0, ans=0.125 2024-08-11 12:40:01,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1102670.0, ans=0.0 2024-08-11 12:40:44,028 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8850, loss[loss=0.1051, beats_loss=0.01156, ecapa_loss=0.0001592, whisper_loss=0.09193, over 18482.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01141, ecapa_loss=0.0001994, whisper_loss=0.09282, over 3872575.01 frames. ], batch size: 70, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:40:44,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1102970.0, ans=0.1 2024-08-11 12:40:48,186 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+01 2.779e+01 3.220e+01 3.967e+01 6.531e+01, threshold=6.439e+01, percent-clipped=1.0 2024-08-11 12:41:33,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1103270.0, ans=0.0 2024-08-11 12:41:35,415 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 12:41:40,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1103270.0, ans=0.1 2024-08-11 12:41:52,496 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 11 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 12:42:06,558 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8900, loss[loss=0.1106, beats_loss=0.01171, ecapa_loss=0.0001759, whisper_loss=0.09709, over 17402.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01145, ecapa_loss=0.0001992, whisper_loss=0.09237, over 3852637.85 frames. ], batch size: 67, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:42:09,722 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 12:42:10,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1103470.0, ans=0.2 2024-08-11 12:42:18,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1103470.0, ans=0.2 2024-08-11 12:42:24,667 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 12:42:35,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1103570.0, ans=0.09899494936611666 2024-08-11 12:42:39,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1103670.0, ans=0.0 2024-08-11 12:42:42,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1103670.0, ans=0.2 2024-08-11 12:42:51,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1103670.0, ans=0.125 2024-08-11 12:42:53,812 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-11 12:42:55,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1103770.0, ans=0.2 2024-08-11 12:43:09,910 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 13 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 12:43:17,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1103870.0, ans=0.0 2024-08-11 12:43:24,424 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 8950, loss[loss=0.105, beats_loss=0.007229, ecapa_loss=0.0002189, whisper_loss=0.09556, over 18044.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01144, ecapa_loss=0.0001995, whisper_loss=0.09214, over 3830158.16 frames. ], batch size: 68, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:43:28,708 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.747e+01 3.145e+01 3.619e+01 5.572e+01, threshold=6.290e+01, percent-clipped=0.0 2024-08-11 12:43:32,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1103970.0, ans=0.015 2024-08-11 12:43:47,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1104070.0, ans=0.1 2024-08-11 12:43:56,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1104170.0, ans=0.125 2024-08-11 12:44:09,764 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 12:44:11,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1104270.0, ans=0.0 2024-08-11 12:44:27,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1104370.0, ans=0.0 2024-08-11 12:44:38,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1104470.0, ans=0.0 2024-08-11 12:44:39,432 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9000, loss[loss=0.09041, beats_loss=0.01133, ecapa_loss=0.0001544, whisper_loss=0.07754, over 14942.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01139, ecapa_loss=0.0002015, whisper_loss=0.09265, over 3836864.79 frames. ], batch size: 58, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:44:39,433 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 12:45:15,307 INFO [train_multi_KD3.py:1149] (2/4) Epoch 8, validation on ASR_libri: loss=0.2575, beats_loss=0, ecapa_loss=0.0006551, whisper_loss=0.2509, over 922467.00 frames. 2024-08-11 12:45:34,132 INFO [train_multi_KD3.py:1149] (2/4) Epoch 8, validation on SV_voxceleb1: loss=0.005315, beats_loss=0, ecapa_loss=0.0005315, whisper_loss=0, over 939242.00 frames. 2024-08-11 12:47:19,786 INFO [train_multi_KD3.py:1149] (2/4) Epoch 8, validation on AT_audioset: loss=0.02529, beats_loss=0.02529, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 12:47:19,790 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 12:47:22,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1104470.0, ans=0.0 2024-08-11 12:47:26,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1104470.0, ans=0.125 2024-08-11 12:47:38,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.50 vs. limit=15.0 2024-08-11 12:48:12,587 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-11 12:48:17,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1104770.0, ans=0.0 2024-08-11 12:48:28,864 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 13 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 12:48:30,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1104870.0, ans=0.125 2024-08-11 12:48:36,087 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9050, loss[loss=0.109, beats_loss=0.01214, ecapa_loss=0.0001871, whisper_loss=0.09498, over 22516.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01138, ecapa_loss=0.0001997, whisper_loss=0.09283, over 3865390.19 frames. ], batch size: 92, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:48:41,050 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.751e+01 3.167e+01 3.446e+01 7.186e+01, threshold=6.334e+01, percent-clipped=1.0 2024-08-11 12:48:48,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1104970.0, ans=0.0 2024-08-11 12:49:14,141 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 12:49:16,978 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 18 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 12:49:18,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1105170.0, ans=0.05 2024-08-11 12:49:23,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1105270.0, ans=0.0 2024-08-11 12:49:24,593 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 12:49:42,909 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 12:49:44,434 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 12:49:51,246 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.92 vs. limit=15.0 2024-08-11 12:49:53,342 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9100, loss[loss=0.1076, beats_loss=0.008111, ecapa_loss=0.0002126, whisper_loss=0.09733, over 15850.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01128, ecapa_loss=0.0002007, whisper_loss=0.09345, over 3879121.47 frames. ], batch size: 63, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:50:16,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1105570.0, ans=0.0 2024-08-11 12:50:21,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1105570.0, ans=0.5 2024-08-11 12:50:35,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1105670.0, ans=0.1 2024-08-11 12:50:40,924 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-11 12:50:42,252 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 12:50:42,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1105770.0, ans=0.125 2024-08-11 12:50:51,892 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 12:51:05,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1105870.0, ans=0.0 2024-08-11 12:51:10,295 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9150, loss[loss=0.1282, beats_loss=0.008194, ecapa_loss=0.0002172, whisper_loss=0.1178, over 18927.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01133, ecapa_loss=0.0002004, whisper_loss=0.09358, over 3918778.68 frames. ], batch size: 76, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:51:14,372 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.723e+01 3.003e+01 3.393e+01 4.790e+01, threshold=6.006e+01, percent-clipped=0.0 2024-08-11 12:51:16,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1105970.0, ans=0.0 2024-08-11 12:51:18,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1105970.0, ans=0.125 2024-08-11 12:51:22,775 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-11 12:51:45,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1106170.0, ans=0.1 2024-08-11 12:51:54,190 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 20 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-11 12:52:18,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1106370.0, ans=10.0 2024-08-11 12:52:21,600 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.491e-01 2024-08-11 12:52:23,548 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.19 vs. limit=15.0 2024-08-11 12:52:25,792 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9200, loss[loss=0.09751, beats_loss=0.01138, ecapa_loss=0.000206, whisper_loss=0.08407, over 21600.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01135, ecapa_loss=0.0002015, whisper_loss=0.09377, over 3890091.91 frames. ], batch size: 90, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:52:29,578 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.130e-01 2024-08-11 12:52:37,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1106470.0, ans=0.125 2024-08-11 12:52:38,881 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 12:52:45,836 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-11 12:52:48,138 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.76 vs. limit=15.0 2024-08-11 12:52:50,203 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-11 12:52:54,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1106570.0, ans=0.0 2024-08-11 12:53:20,947 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 12:53:31,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1106870.0, ans=0.2 2024-08-11 12:53:33,507 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-11 12:53:42,592 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9250, loss[loss=0.08751, beats_loss=0.01239, ecapa_loss=0.0002079, whisper_loss=0.07304, over 16017.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01134, ecapa_loss=0.0002027, whisper_loss=0.09348, over 3894447.78 frames. ], batch size: 67, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:53:47,030 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.771e+01 3.106e+01 3.599e+01 1.159e+02, threshold=6.212e+01, percent-clipped=1.0 2024-08-11 12:53:59,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1107070.0, ans=0.0 2024-08-11 12:54:08,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1107070.0, ans=0.125 2024-08-11 12:54:25,599 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 12:54:43,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1107370.0, ans=0.0 2024-08-11 12:54:43,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1107370.0, ans=0.125 2024-08-11 12:54:46,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1107370.0, ans=0.5 2024-08-11 12:54:57,083 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9300, loss[loss=0.08915, beats_loss=0.0119, ecapa_loss=0.0002564, whisper_loss=0.07468, over 22289.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01144, ecapa_loss=0.0002008, whisper_loss=0.09234, over 3880884.97 frames. ], batch size: 93, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:55:00,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1107470.0, ans=0.125 2024-08-11 12:55:16,419 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 12:55:18,936 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.90 vs. limit=10.0 2024-08-11 12:55:29,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1107670.0, ans=0.0 2024-08-11 12:55:31,118 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-11 12:55:37,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1107670.0, ans=0.125 2024-08-11 12:55:37,428 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=12.0 2024-08-11 12:55:39,843 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.341e-01 2024-08-11 12:55:48,577 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 12:55:57,417 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 12:56:08,689 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2024-08-11 12:56:12,522 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9350, loss[loss=0.1116, beats_loss=0.01071, ecapa_loss=0.0002292, whisper_loss=0.09865, over 22167.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01144, ecapa_loss=0.0002019, whisper_loss=0.09197, over 3882132.97 frames. ], batch size: 89, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:56:17,347 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.772e+01 2.988e+01 3.438e+01 1.215e+02, threshold=5.975e+01, percent-clipped=1.0 2024-08-11 12:56:23,515 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 12:56:23,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1107970.0, ans=0.1 2024-08-11 12:56:39,410 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 15 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 12:56:40,625 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-11 12:56:45,240 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-11 12:56:54,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1108170.0, ans=0.0 2024-08-11 12:57:00,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1108270.0, ans=0.125 2024-08-11 12:57:14,959 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 12:57:23,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1108370.0, ans=0.2 2024-08-11 12:57:28,964 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9400, loss[loss=0.09942, beats_loss=0.01274, ecapa_loss=0.000205, whisper_loss=0.08464, over 17131.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01145, ecapa_loss=0.0002007, whisper_loss=0.09232, over 3887917.85 frames. ], batch size: 70, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:57:44,352 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 16 from Vox, 51 fro AS 2024-08-11 12:57:50,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1108570.0, ans=0.07 2024-08-11 12:58:04,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1108670.0, ans=0.125 2024-08-11 12:58:21,060 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 12:58:22,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1108770.0, ans=0.0 2024-08-11 12:58:27,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1108770.0, ans=0.1 2024-08-11 12:58:32,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1108870.0, ans=0.125 2024-08-11 12:58:44,452 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 12:58:45,587 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9450, loss[loss=0.111, beats_loss=0.01036, ecapa_loss=0.000201, whisper_loss=0.09864, over 19321.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0114, ecapa_loss=0.0002, whisper_loss=0.09301, over 3879935.49 frames. ], batch size: 74, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:58:50,401 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.669e+01 3.064e+01 3.549e+01 5.554e+01, threshold=6.127e+01, percent-clipped=0.0 2024-08-11 12:59:00,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1109070.0, ans=0.125 2024-08-11 12:59:07,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1109070.0, ans=0.1 2024-08-11 12:59:15,122 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.24 vs. limit=22.5 2024-08-11 12:59:22,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1109170.0, ans=0.0 2024-08-11 12:59:28,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1109170.0, ans=0.2 2024-08-11 13:00:00,581 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9500, loss[loss=0.1233, beats_loss=0.01135, ecapa_loss=0.0002186, whisper_loss=0.1098, over 14945.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01125, ecapa_loss=0.0001999, whisper_loss=0.09365, over 3868218.34 frames. ], batch size: 60, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:00:02,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1109470.0, ans=0.035 2024-08-11 13:00:08,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1109470.0, ans=0.2 2024-08-11 13:00:16,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1109570.0, ans=0.0 2024-08-11 13:00:21,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1109570.0, ans=0.1 2024-08-11 13:00:48,574 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2024-08-11 13:01:00,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1109870.0, ans=0.125 2024-08-11 13:01:00,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1109870.0, ans=0.1 2024-08-11 13:01:02,587 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 13:01:08,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1109870.0, ans=0.0 2024-08-11 13:01:13,948 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9550, loss[loss=0.1145, beats_loss=0.01102, ecapa_loss=0.0001769, whisper_loss=0.1017, over 20783.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01126, ecapa_loss=0.0001999, whisper_loss=0.09259, over 3852294.92 frames. ], batch size: 79, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:01:18,261 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.581e+01 3.097e+01 3.550e+01 5.814e+01, threshold=6.195e+01, percent-clipped=0.0 2024-08-11 13:01:53,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2024-08-11 13:02:08,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1110270.0, ans=0.125 2024-08-11 13:02:28,570 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9600, loss[loss=0.1025, beats_loss=0.009585, ecapa_loss=0.0002116, whisper_loss=0.0908, over 18627.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01122, ecapa_loss=0.0001998, whisper_loss=0.09256, over 3846868.75 frames. ], batch size: 73, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:02:31,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1110470.0, ans=0.0 2024-08-11 13:02:54,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1110570.0, ans=0.1 2024-08-11 13:03:19,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1110770.0, ans=0.0 2024-08-11 13:03:30,899 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 13:03:34,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1110870.0, ans=0.0 2024-08-11 13:03:39,532 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9650, loss[loss=0.1292, beats_loss=0.009201, ecapa_loss=0.000233, whisper_loss=0.1176, over 16463.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01131, ecapa_loss=0.0001995, whisper_loss=0.09176, over 3812543.31 frames. ], batch size: 67, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:03:43,492 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.716e+01 3.002e+01 3.574e+01 5.577e+01, threshold=6.004e+01, percent-clipped=0.0 2024-08-11 13:03:52,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1111070.0, ans=0.125 2024-08-11 13:04:07,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1111170.0, ans=0.2 2024-08-11 13:04:10,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1111170.0, ans=0.125 2024-08-11 13:04:16,420 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 35 from Vox, 33 fro AS 2024-08-11 13:04:25,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1111270.0, ans=0.125 2024-08-11 13:04:28,503 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2024-08-11 13:04:43,460 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 13:04:45,054 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 13:04:47,942 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-11 13:04:50,633 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9700, loss[loss=0.09452, beats_loss=0.01112, ecapa_loss=0.0001934, whisper_loss=0.08147, over 19733.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01129, ecapa_loss=0.0002014, whisper_loss=0.09228, over 3853535.11 frames. ], batch size: 79, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:05:18,686 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 13:05:25,717 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 13:05:29,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1111670.0, ans=0.0 2024-08-11 13:05:38,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1111770.0, ans=0.125 2024-08-11 13:06:03,914 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9750, loss[loss=0.1013, beats_loss=0.01149, ecapa_loss=0.0002085, whisper_loss=0.08775, over 19772.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01134, ecapa_loss=0.0002016, whisper_loss=0.09261, over 3865861.90 frames. ], batch size: 76, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:06:05,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1111970.0, ans=0.0 2024-08-11 13:06:08,394 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.170e+01 2.596e+01 2.916e+01 3.374e+01 5.743e+01, threshold=5.832e+01, percent-clipped=0.0 2024-08-11 13:06:19,222 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-11 13:06:45,813 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 13:06:48,259 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2024-08-11 13:07:07,969 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 13:07:17,362 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9800, loss[loss=0.1316, beats_loss=0.01194, ecapa_loss=0.0001899, whisper_loss=0.1177, over 23280.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01129, ecapa_loss=0.0002006, whisper_loss=0.09312, over 3870262.20 frames. ], batch size: 88, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:07:27,940 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.82 vs. limit=22.5 2024-08-11 13:07:31,576 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 31 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 13:07:34,475 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.940e+05 2024-08-11 13:07:44,622 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 13:07:46,538 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.40 vs. limit=22.5 2024-08-11 13:07:47,176 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 32 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 13:07:55,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1112670.0, ans=0.2 2024-08-11 13:08:00,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1112770.0, ans=0.2 2024-08-11 13:08:08,181 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-11 13:08:32,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9850, loss[loss=0.09668, beats_loss=0.009631, ecapa_loss=0.0002271, whisper_loss=0.08478, over 15621.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01131, ecapa_loss=0.0002017, whisper_loss=0.09311, over 3894043.24 frames. ], batch size: 61, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:08:37,507 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.640e+01 2.920e+01 3.284e+01 5.372e+01, threshold=5.839e+01, percent-clipped=0.0 2024-08-11 13:08:59,122 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 13:09:16,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.22 vs. limit=12.0 2024-08-11 13:09:16,575 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.02 vs. limit=10.0 2024-08-11 13:09:21,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1113270.0, ans=0.2 2024-08-11 13:09:26,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1113270.0, ans=0.0 2024-08-11 13:09:35,425 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 13:09:36,533 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 13:09:41,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1113370.0, ans=0.1 2024-08-11 13:09:47,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1113370.0, ans=0.015 2024-08-11 13:09:50,663 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9900, loss[loss=0.09769, beats_loss=0.01375, ecapa_loss=0.0001591, whisper_loss=0.08235, over 22028.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01135, ecapa_loss=0.0002009, whisper_loss=0.09288, over 3896656.73 frames. ], batch size: 87, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:09:55,891 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 13:10:06,789 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-11 13:10:09,259 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 13 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-11 13:10:30,955 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 11 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 13:10:41,260 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.06 vs. limit=22.5 2024-08-11 13:10:48,297 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 13:10:52,276 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 13:10:53,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1113770.0, ans=0.0 2024-08-11 13:11:10,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1113870.0, ans=0.0 2024-08-11 13:11:18,551 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 9950, loss[loss=0.07586, beats_loss=0.01098, ecapa_loss=0.0002305, whisper_loss=0.06257, over 15508.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01146, ecapa_loss=0.0001994, whisper_loss=0.09206, over 3895806.75 frames. ], batch size: 64, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:11:24,430 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.684e+01 2.921e+01 3.407e+01 1.322e+02, threshold=5.842e+01, percent-clipped=4.0 2024-08-11 13:11:42,048 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.86 vs. limit=22.5 2024-08-11 13:11:53,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1114170.0, ans=0.125 2024-08-11 13:11:58,092 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 19 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 13:12:05,330 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.77 vs. limit=22.5 2024-08-11 13:12:10,332 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 27 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-11 13:12:13,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1114270.0, ans=0.125 2024-08-11 13:12:17,308 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2024-08-11 13:12:34,937 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 13:12:35,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1114370.0, ans=0.0 2024-08-11 13:12:50,190 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10000, loss[loss=0.1152, beats_loss=0.009247, ecapa_loss=0.0002521, whisper_loss=0.1035, over 17659.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01145, ecapa_loss=0.0001993, whisper_loss=0.09217, over 3897580.54 frames. ], batch size: 73, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:12:56,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1114470.0, ans=0.0 2024-08-11 13:12:57,816 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 13:13:02,907 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 13:13:04,884 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 13:13:26,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1114670.0, ans=0.125 2024-08-11 13:13:40,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1114670.0, ans=0.125 2024-08-11 13:13:50,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1114770.0, ans=0.125 2024-08-11 13:14:20,580 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10050, loss[loss=0.07915, beats_loss=0.0106, ecapa_loss=0.0002289, whisper_loss=0.06627, over 21048.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01136, ecapa_loss=0.0001998, whisper_loss=0.09238, over 3862976.89 frames. ], batch size: 91, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:14:26,570 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.703e+01 2.998e+01 3.429e+01 6.033e+01, threshold=5.996e+01, percent-clipped=1.0 2024-08-11 13:14:43,950 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.18 vs. limit=15.0 2024-08-11 13:14:52,558 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 13:14:55,023 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-11 13:14:56,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1115170.0, ans=0.07 2024-08-11 13:14:59,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1115170.0, ans=0.0 2024-08-11 13:15:22,711 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 35 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 13:15:32,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1115270.0, ans=0.125 2024-08-11 13:15:38,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1115370.0, ans=0.0 2024-08-11 13:15:57,279 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10100, loss[loss=0.1206, beats_loss=0.01275, ecapa_loss=0.0001771, whisper_loss=0.106, over 23720.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01138, ecapa_loss=0.0001989, whisper_loss=0.09278, over 3883627.79 frames. ], batch size: 92, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:16:24,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1115570.0, ans=0.0 2024-08-11 13:16:53,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1115670.0, ans=0.125 2024-08-11 13:17:03,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1115770.0, ans=0.125 2024-08-11 13:17:33,502 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 37 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 13:17:41,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1115870.0, ans=0.0 2024-08-11 13:17:43,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1115870.0, ans=0.0 2024-08-11 13:17:45,424 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10150, loss[loss=0.08106, beats_loss=0.01155, ecapa_loss=0.000237, whisper_loss=0.06714, over 21419.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01131, ecapa_loss=0.0002009, whisper_loss=0.09298, over 3896718.75 frames. ], batch size: 92, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:17:49,650 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.757e+01 3.072e+01 3.612e+01 1.119e+02, threshold=6.144e+01, percent-clipped=1.0 2024-08-11 13:17:54,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1115970.0, ans=0.125 2024-08-11 13:17:56,506 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 13:17:56,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1115970.0, ans=0.5 2024-08-11 13:18:21,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1116170.0, ans=0.125 2024-08-11 13:18:39,552 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 35 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 13:19:00,720 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10200, loss[loss=0.1073, beats_loss=0.01058, ecapa_loss=0.0002082, whisper_loss=0.09464, over 22103.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01131, ecapa_loss=0.0001999, whisper_loss=0.09287, over 3908310.60 frames. ], batch size: 87, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:19:12,984 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2024-08-11 13:19:18,497 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 13:19:22,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1116570.0, ans=0.125 2024-08-11 13:19:23,462 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 13:19:23,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1116570.0, ans=0.125 2024-08-11 13:19:32,129 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2024-08-11 13:19:40,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1116670.0, ans=0.125 2024-08-11 13:19:55,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1116770.0, ans=0.125 2024-08-11 13:20:04,463 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=12.0 2024-08-11 13:20:05,155 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 13:20:19,159 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10250, loss[loss=0.1038, beats_loss=0.01176, ecapa_loss=0.0001751, whisper_loss=0.09033, over 22607.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01125, ecapa_loss=0.0001988, whisper_loss=0.09337, over 3931555.33 frames. ], batch size: 86, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:20:23,865 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 2.664e+01 3.001e+01 3.567e+01 5.136e+01, threshold=6.003e+01, percent-clipped=0.0 2024-08-11 13:20:36,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1117070.0, ans=0.5 2024-08-11 13:20:49,034 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 13:21:10,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1117270.0, ans=0.0 2024-08-11 13:21:14,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1117270.0, ans=0.125 2024-08-11 13:21:14,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1117270.0, ans=0.09899494936611666 2024-08-11 13:21:19,706 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-11 13:21:35,025 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 13:21:40,730 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10300, loss[loss=0.1202, beats_loss=0.008274, ecapa_loss=0.0002695, whisper_loss=0.1093, over 19153.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01125, ecapa_loss=0.0001986, whisper_loss=0.0932, over 3918693.75 frames. ], batch size: 77, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:21:45,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1117470.0, ans=0.2 2024-08-11 13:21:45,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1117470.0, ans=0.0 2024-08-11 13:21:55,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1117470.0, ans=0.125 2024-08-11 13:21:55,654 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.23 vs. limit=15.0 2024-08-11 13:22:00,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1117570.0, ans=0.0 2024-08-11 13:22:14,253 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-11 13:22:28,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1117670.0, ans=0.125 2024-08-11 13:22:29,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1117770.0, ans=0.125 2024-08-11 13:22:37,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1117770.0, ans=0.1 2024-08-11 13:23:01,784 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10350, loss[loss=0.1081, beats_loss=0.01108, ecapa_loss=0.0001601, whisper_loss=0.09542, over 22789.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01118, ecapa_loss=0.0001985, whisper_loss=0.09396, over 3921247.65 frames. ], batch size: 88, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:23:06,378 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.796e+01 3.108e+01 3.786e+01 6.316e+01, threshold=6.215e+01, percent-clipped=1.0 2024-08-11 13:23:12,556 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-11 13:23:25,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1118070.0, ans=0.125 2024-08-11 13:23:42,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1118170.0, ans=0.0 2024-08-11 13:23:48,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1118270.0, ans=0.125 2024-08-11 13:23:48,395 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2024-08-11 13:23:51,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1118270.0, ans=0.125 2024-08-11 13:24:18,025 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10400, loss[loss=0.08472, beats_loss=0.01391, ecapa_loss=0.0001624, whisper_loss=0.06919, over 18720.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01127, ecapa_loss=0.0001975, whisper_loss=0.09317, over 3875388.23 frames. ], batch size: 73, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:24:21,568 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 13:24:21,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1118470.0, ans=0.0 2024-08-11 13:24:34,361 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-11 13:24:35,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1118570.0, ans=0.2 2024-08-11 13:24:40,117 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 13:25:00,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1118670.0, ans=0.125 2024-08-11 13:25:00,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1118670.0, ans=0.0 2024-08-11 13:25:05,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1118770.0, ans=0.125 2024-08-11 13:25:10,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1118770.0, ans=0.0 2024-08-11 13:25:35,225 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10450, loss[loss=0.08735, beats_loss=0.01405, ecapa_loss=0.0001679, whisper_loss=0.07161, over 23368.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01128, ecapa_loss=0.000198, whisper_loss=0.09284, over 3901105.54 frames. ], batch size: 95, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:25:39,575 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.711e+01 3.019e+01 3.517e+01 4.993e+01, threshold=6.039e+01, percent-clipped=0.0 2024-08-11 13:25:46,657 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2024-08-11 13:25:57,454 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.48 vs. limit=22.5 2024-08-11 13:26:08,460 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 15 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 13:26:23,458 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 13:26:29,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1119270.0, ans=0.0 2024-08-11 13:26:45,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1119370.0, ans=0.2 2024-08-11 13:26:51,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1119370.0, ans=0.0 2024-08-11 13:26:53,848 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10500, loss[loss=0.1239, beats_loss=0.009437, ecapa_loss=0.0001979, whisper_loss=0.1124, over 19095.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01117, ecapa_loss=0.0001991, whisper_loss=0.0934, over 3892239.50 frames. ], batch size: 75, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:26:55,586 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 13:27:06,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1119470.0, ans=0.125 2024-08-11 13:27:43,062 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 13:27:57,898 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.35 vs. limit=22.5 2024-08-11 13:28:04,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1119870.0, ans=0.0 2024-08-11 13:28:05,522 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 13:28:11,275 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10550, loss[loss=0.1097, beats_loss=0.01014, ecapa_loss=0.0002212, whisper_loss=0.09732, over 17182.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01121, ecapa_loss=0.0001994, whisper_loss=0.09308, over 3854115.02 frames. ], batch size: 69, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:28:17,806 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.650e+01 3.072e+01 3.667e+01 9.491e+01, threshold=6.144e+01, percent-clipped=1.0 2024-08-11 13:28:18,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1119970.0, ans=0.0 2024-08-11 13:28:31,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1120070.0, ans=0.0 2024-08-11 13:28:34,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1120070.0, ans=0.0 2024-08-11 13:28:43,098 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 13:28:44,118 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.15 vs. limit=8.0 2024-08-11 13:28:49,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1120170.0, ans=0.2 2024-08-11 13:28:56,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1120170.0, ans=0.0 2024-08-11 13:29:11,370 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 10 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 13:29:20,188 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.61 vs. limit=15.0 2024-08-11 13:29:26,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1120370.0, ans=0.125 2024-08-11 13:29:27,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1120370.0, ans=0.0 2024-08-11 13:29:33,343 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10600, loss[loss=0.1038, beats_loss=0.01151, ecapa_loss=0.0002109, whisper_loss=0.09014, over 22423.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01124, ecapa_loss=0.0001996, whisper_loss=0.09246, over 3863947.11 frames. ], batch size: 88, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:29:33,570 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 36 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 13:29:37,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1120470.0, ans=0.125 2024-08-11 13:29:40,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1120470.0, ans=0.125 2024-08-11 13:29:44,145 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 13:29:58,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1120570.0, ans=0.125 2024-08-11 13:30:00,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1120570.0, ans=0.1 2024-08-11 13:30:05,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1120670.0, ans=0.2 2024-08-11 13:30:27,740 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 13:30:35,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1120870.0, ans=0.125 2024-08-11 13:30:37,148 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-11 13:30:50,457 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10650, loss[loss=0.1144, beats_loss=0.01067, ecapa_loss=0.0002243, whisper_loss=0.1014, over 21349.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01119, ecapa_loss=0.0001979, whisper_loss=0.09331, over 3880019.51 frames. ], batch size: 89, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:30:57,373 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.737e+01 3.110e+01 3.500e+01 6.521e+01, threshold=6.221e+01, percent-clipped=1.0 2024-08-11 13:31:14,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1121070.0, ans=0.0 2024-08-11 13:31:14,419 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.65 vs. limit=15.0 2024-08-11 13:31:17,938 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 37 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-11 13:31:18,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1121070.0, ans=0.09899494936611666 2024-08-11 13:31:19,043 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 13:31:33,164 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2024-08-11 13:32:02,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1121370.0, ans=0.0 2024-08-11 13:32:03,872 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 13:32:08,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1121470.0, ans=10.0 2024-08-11 13:32:10,112 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10700, loss[loss=0.1335, beats_loss=0.009698, ecapa_loss=0.0002304, whisper_loss=0.1215, over 19471.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01122, ecapa_loss=0.0001963, whisper_loss=0.09354, over 3907619.97 frames. ], batch size: 76, lr: 7.77e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:32:14,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2024-08-11 13:32:17,561 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 19 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 13:32:25,796 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 13:32:26,316 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.18 vs. limit=10.0 2024-08-11 13:32:33,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1121570.0, ans=0.1 2024-08-11 13:32:44,843 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.02 vs. limit=22.5 2024-08-11 13:32:54,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1121670.0, ans=0.0 2024-08-11 13:33:09,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1121770.0, ans=0.125 2024-08-11 13:33:22,076 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 13:33:29,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1121970.0, ans=0.125 2024-08-11 13:33:31,034 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10750, loss[loss=0.1003, beats_loss=0.01259, ecapa_loss=0.000173, whisper_loss=0.08598, over 22183.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.0113, ecapa_loss=0.0001961, whisper_loss=0.09365, over 3924785.23 frames. ], batch size: 90, lr: 7.77e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:33:31,299 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 13:33:38,898 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 2.776e+01 3.070e+01 3.397e+01 5.449e+01, threshold=6.140e+01, percent-clipped=0.0 2024-08-11 13:33:43,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1121970.0, ans=0.1 2024-08-11 13:33:43,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1121970.0, ans=0.125 2024-08-11 13:33:44,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1121970.0, ans=0.1 2024-08-11 13:33:44,904 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.474e-01 2024-08-11 13:33:52,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1122070.0, ans=0.04949747468305833 2024-08-11 13:33:55,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1122070.0, ans=0.125 2024-08-11 13:34:13,901 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 13:34:35,228 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.51 vs. limit=22.5 2024-08-11 13:34:49,798 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10800, loss[loss=0.08418, beats_loss=0.01331, ecapa_loss=0.0002096, whisper_loss=0.06877, over 20274.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01127, ecapa_loss=0.0001958, whisper_loss=0.09322, over 3884231.80 frames. ], batch size: 88, lr: 7.77e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:34:51,288 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 13:35:12,180 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2024-08-11 13:35:19,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1122670.0, ans=0.2 2024-08-11 13:35:21,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1122670.0, ans=0.1 2024-08-11 13:35:22,413 INFO [train_multi_KD3.py:844] (2/4) A total of 99 cuts. 28 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-11 13:35:22,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1122670.0, ans=0.0 2024-08-11 13:35:25,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1122670.0, ans=0.125 2024-08-11 13:35:47,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1122770.0, ans=0.2 2024-08-11 13:36:00,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1122870.0, ans=0.125 2024-08-11 13:36:06,319 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 13:36:07,430 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10850, loss[loss=0.09774, beats_loss=0.01153, ecapa_loss=0.0002224, whisper_loss=0.08398, over 18453.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01131, ecapa_loss=0.0001954, whisper_loss=0.0932, over 3892860.00 frames. ], batch size: 77, lr: 7.77e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:36:09,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1122970.0, ans=0.0 2024-08-11 13:36:15,206 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.852e+01 3.448e+01 4.280e+01 7.389e+01, threshold=6.896e+01, percent-clipped=2.0 2024-08-11 13:36:21,409 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 13:36:27,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1123070.0, ans=0.125 2024-08-11 13:36:35,688 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.89 vs. limit=15.0 2024-08-11 13:36:43,909 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.19 vs. limit=12.0 2024-08-11 13:36:50,822 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-11 13:37:00,171 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-11 13:37:27,730 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-11 13:37:29,109 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10900, loss[loss=0.1164, beats_loss=0.009564, ecapa_loss=0.0002799, whisper_loss=0.1041, over 20559.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01134, ecapa_loss=0.0001956, whisper_loss=0.09348, over 3902456.12 frames. ], batch size: 88, lr: 7.77e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:37:29,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1123470.0, ans=0.0 2024-08-11 13:37:32,520 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 38 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-11 13:37:32,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1123470.0, ans=0.07 2024-08-11 13:37:35,737 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 11 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 13:37:38,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1123470.0, ans=0.0 2024-08-11 13:38:08,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1123670.0, ans=0.0 2024-08-11 13:38:12,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.13 vs. limit=15.0 2024-08-11 13:38:13,322 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 13:38:41,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1123870.0, ans=0.125 2024-08-11 13:38:45,896 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 10950, loss[loss=0.1032, beats_loss=0.01208, ecapa_loss=0.0002054, whisper_loss=0.08903, over 21259.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01133, ecapa_loss=0.0001953, whisper_loss=0.09371, over 3941371.60 frames. ], batch size: 90, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:38:48,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1123970.0, ans=0.125 2024-08-11 13:38:53,456 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+01 2.774e+01 3.085e+01 3.666e+01 6.229e+01, threshold=6.171e+01, percent-clipped=0.0 2024-08-11 13:39:08,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.90 vs. limit=15.0 2024-08-11 13:39:09,612 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 31 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 13:39:17,212 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 13:39:25,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1124170.0, ans=0.0 2024-08-11 13:39:27,814 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 13:39:45,384 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 13:39:45,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1124270.0, ans=0.0 2024-08-11 13:39:52,664 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.85 vs. limit=15.0 2024-08-11 13:40:03,042 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11000, loss[loss=0.111, beats_loss=0.01004, ecapa_loss=0.0002475, whisper_loss=0.09845, over 22065.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01129, ecapa_loss=0.0001955, whisper_loss=0.09381, over 3912493.79 frames. ], batch size: 93, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:40:23,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-08-11 13:40:25,917 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 22 from LS+wenet, 31 from Vox, 42 fro AS 2024-08-11 13:40:26,468 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2024-08-11 13:40:31,616 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 13:40:54,160 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-11 13:41:10,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1124870.0, ans=0.1 2024-08-11 13:41:13,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1124870.0, ans=10.0 2024-08-11 13:41:22,313 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11050, loss[loss=0.08963, beats_loss=0.01267, ecapa_loss=0.0002309, whisper_loss=0.07466, over 18057.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01133, ecapa_loss=0.0001969, whisper_loss=0.09313, over 3919650.13 frames. ], batch size: 76, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:41:23,838 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 13:41:28,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1124970.0, ans=0.125 2024-08-11 13:41:29,678 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.704e+01 3.049e+01 3.665e+01 6.034e+01, threshold=6.098e+01, percent-clipped=0.0 2024-08-11 13:41:42,673 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 13:41:43,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1125070.0, ans=0.125 2024-08-11 13:41:45,703 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 13:41:52,124 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-11 13:42:01,732 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 26 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-11 13:42:07,503 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 13:42:18,359 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 13:42:39,108 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11100, loss[loss=0.1098, beats_loss=0.01032, ecapa_loss=0.0001882, whisper_loss=0.0976, over 18645.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01125, ecapa_loss=0.0001974, whisper_loss=0.0932, over 3914286.41 frames. ], batch size: 72, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:42:47,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1125470.0, ans=0.125 2024-08-11 13:43:03,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.00 vs. limit=22.5 2024-08-11 13:43:28,858 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.34 vs. limit=15.0 2024-08-11 13:43:30,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1125770.0, ans=0.1 2024-08-11 13:43:36,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1125770.0, ans=0.125 2024-08-11 13:43:40,634 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 13:43:42,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1125870.0, ans=0.125 2024-08-11 13:43:44,595 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 13:43:48,454 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2024-08-11 13:43:50,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1125870.0, ans=0.0 2024-08-11 13:44:01,662 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11150, loss[loss=0.09117, beats_loss=0.01408, ecapa_loss=0.0001685, whisper_loss=0.07541, over 14778.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01125, ecapa_loss=0.0001969, whisper_loss=0.09345, over 3904585.49 frames. ], batch size: 62, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:44:09,627 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.627e+01 3.035e+01 3.415e+01 6.543e+01, threshold=6.070e+01, percent-clipped=1.0 2024-08-11 13:44:10,302 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.064e+02 2024-08-11 13:44:19,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1126070.0, ans=0.2 2024-08-11 13:44:24,195 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.636e-02 2024-08-11 13:44:31,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1126170.0, ans=0.05 2024-08-11 13:44:43,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1126170.0, ans=0.125 2024-08-11 13:44:45,194 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.386e+05 2024-08-11 13:44:58,184 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2024-08-11 13:45:03,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1126370.0, ans=0.2 2024-08-11 13:45:15,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1126370.0, ans=0.05 2024-08-11 13:45:15,378 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.70 vs. limit=15.0 2024-08-11 13:45:16,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1126370.0, ans=0.125 2024-08-11 13:45:18,309 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11200, loss[loss=0.1151, beats_loss=0.01092, ecapa_loss=0.000178, whisper_loss=0.1024, over 21923.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01114, ecapa_loss=0.0001974, whisper_loss=0.09406, over 3892850.18 frames. ], batch size: 88, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:45:21,615 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 13:45:56,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1126670.0, ans=0.0 2024-08-11 13:46:02,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1126670.0, ans=0.1 2024-08-11 13:46:05,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1126670.0, ans=0.1 2024-08-11 13:46:14,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1126770.0, ans=0.0 2024-08-11 13:46:38,890 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 13:46:41,789 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.03 vs. limit=22.5 2024-08-11 13:46:42,756 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11250, loss[loss=0.117, beats_loss=0.01169, ecapa_loss=0.0001653, whisper_loss=0.1036, over 20204.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01117, ecapa_loss=0.0001971, whisper_loss=0.09373, over 3906201.35 frames. ], batch size: 78, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:46:43,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1126970.0, ans=0.125 2024-08-11 13:46:52,225 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.002e+01 2.684e+01 2.944e+01 3.546e+01 6.829e+01, threshold=5.887e+01, percent-clipped=2.0 2024-08-11 13:46:52,812 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-08-11 13:46:59,437 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-11 13:47:00,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1127070.0, ans=0.07 2024-08-11 13:47:00,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1127070.0, ans=0.0 2024-08-11 13:47:14,940 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2024-08-11 13:47:18,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1127170.0, ans=0.0 2024-08-11 13:47:33,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=1127270.0, ans=0.02 2024-08-11 13:47:34,263 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2024-08-11 13:47:37,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1127270.0, ans=0.1 2024-08-11 13:47:53,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1127370.0, ans=0.0 2024-08-11 13:47:56,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1127370.0, ans=0.125 2024-08-11 13:47:56,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1127370.0, ans=0.2 2024-08-11 13:48:05,913 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11300, loss[loss=0.1112, beats_loss=0.01037, ecapa_loss=0.0001654, whisper_loss=0.09916, over 22729.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01119, ecapa_loss=0.0001959, whisper_loss=0.09321, over 3912041.84 frames. ], batch size: 86, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:48:08,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=1127470.0, ans=15.0 2024-08-11 13:48:27,923 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 13:48:48,642 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-11 13:49:25,194 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11350, loss[loss=0.07702, beats_loss=0.01381, ecapa_loss=0.0001954, whisper_loss=0.06126, over 14430.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01124, ecapa_loss=0.0001956, whisper_loss=0.09261, over 3906372.56 frames. ], batch size: 60, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:49:28,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1127970.0, ans=0.05 2024-08-11 13:49:33,911 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.648e+01 3.083e+01 3.583e+01 5.645e+01, threshold=6.165e+01, percent-clipped=0.0 2024-08-11 13:49:43,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1128070.0, ans=0.125 2024-08-11 13:49:47,904 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 13:49:51,264 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-11 13:50:00,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1128170.0, ans=0.125 2024-08-11 13:50:03,085 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 13:50:11,573 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2024-08-11 13:50:17,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1128270.0, ans=0.125 2024-08-11 13:50:20,009 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 13:50:23,221 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 13:50:27,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1128370.0, ans=0.125 2024-08-11 13:50:38,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1128370.0, ans=0.2 2024-08-11 13:50:44,174 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11400, loss[loss=0.1153, beats_loss=0.01108, ecapa_loss=0.0001928, whisper_loss=0.1023, over 21705.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01126, ecapa_loss=0.0001949, whisper_loss=0.0928, over 3908587.94 frames. ], batch size: 84, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:50:47,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1128470.0, ans=0.125 2024-08-11 13:51:11,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1128570.0, ans=0.125 2024-08-11 13:51:16,600 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 13:51:21,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1128670.0, ans=0.0 2024-08-11 13:51:27,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1128770.0, ans=0.125 2024-08-11 13:51:32,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1128770.0, ans=15.0 2024-08-11 13:51:44,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1128870.0, ans=0.125 2024-08-11 13:51:48,636 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2024-08-11 13:51:59,164 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11450, loss[loss=0.1186, beats_loss=0.01149, ecapa_loss=0.0001856, whisper_loss=0.1053, over 22581.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01126, ecapa_loss=0.0001951, whisper_loss=0.09318, over 3907652.54 frames. ], batch size: 90, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:51:59,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1128970.0, ans=0.125 2024-08-11 13:52:07,529 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.738e+01 3.140e+01 3.413e+01 5.128e+01, threshold=6.280e+01, percent-clipped=0.0 2024-08-11 13:52:25,186 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 13:52:28,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1129070.0, ans=0.2 2024-08-11 13:52:29,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1129170.0, ans=0.1 2024-08-11 13:52:33,443 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 13:52:37,591 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 22 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-11 13:53:10,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1129370.0, ans=0.125 2024-08-11 13:53:17,921 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11500, loss[loss=0.1239, beats_loss=0.01014, ecapa_loss=0.0002201, whisper_loss=0.1116, over 22934.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01124, ecapa_loss=0.0001961, whisper_loss=0.09422, over 3940567.04 frames. ], batch size: 91, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:53:27,831 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-11 13:53:35,797 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 13:53:37,087 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 13:53:50,409 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 13:53:50,801 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 13:54:11,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1129770.0, ans=0.1 2024-08-11 13:54:16,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1129770.0, ans=0.0 2024-08-11 13:54:18,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1129770.0, ans=0.0 2024-08-11 13:54:23,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1129870.0, ans=0.2 2024-08-11 13:54:26,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1129870.0, ans=0.0 2024-08-11 13:54:29,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1129870.0, ans=0.0 2024-08-11 13:54:36,007 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11550, loss[loss=0.109, beats_loss=0.01252, ecapa_loss=0.0002325, whisper_loss=0.09418, over 21125.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.0112, ecapa_loss=0.0001974, whisper_loss=0.09405, over 3916540.31 frames. ], batch size: 89, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:54:36,210 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 13:54:45,185 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.068e+01 2.847e+01 3.236e+01 3.830e+01 5.730e+01, threshold=6.473e+01, percent-clipped=0.0 2024-08-11 13:55:08,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1130070.0, ans=0.1 2024-08-11 13:55:19,562 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=15.0 2024-08-11 13:55:20,620 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 13:55:23,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1130170.0, ans=0.1 2024-08-11 13:55:26,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1130170.0, ans=0.0 2024-08-11 13:55:30,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1130270.0, ans=0.125 2024-08-11 13:55:33,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1130270.0, ans=0.2 2024-08-11 13:55:40,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1130270.0, ans=10.0 2024-08-11 13:55:53,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1130370.0, ans=0.0 2024-08-11 13:55:59,488 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11600, loss[loss=0.08971, beats_loss=0.01358, ecapa_loss=0.0001668, whisper_loss=0.07446, over 22330.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01127, ecapa_loss=0.0001988, whisper_loss=0.09333, over 3930751.54 frames. ], batch size: 92, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:56:01,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1130470.0, ans=0.0 2024-08-11 13:56:06,110 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 13:56:10,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1130470.0, ans=0.95 2024-08-11 13:56:13,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1130570.0, ans=0.0 2024-08-11 13:56:13,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1130570.0, ans=0.0 2024-08-11 13:56:26,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1130570.0, ans=0.2 2024-08-11 13:56:52,980 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 13:57:03,983 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 13:57:16,163 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 13:57:16,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1130970.0, ans=0.95 2024-08-11 13:57:16,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1130970.0, ans=0.0 2024-08-11 13:57:17,955 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11650, loss[loss=0.0846, beats_loss=0.01202, ecapa_loss=0.0002008, whisper_loss=0.07058, over 16586.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01127, ecapa_loss=0.0001969, whisper_loss=0.09338, over 3918761.95 frames. ], batch size: 67, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:57:24,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1130970.0, ans=0.125 2024-08-11 13:57:24,804 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=15.0 2024-08-11 13:57:26,696 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.647e+01 2.966e+01 3.476e+01 5.523e+01, threshold=5.933e+01, percent-clipped=0.0 2024-08-11 13:57:32,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1130970.0, ans=0.0 2024-08-11 13:57:38,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1131070.0, ans=0.125 2024-08-11 13:57:45,696 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 13:57:55,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1131170.0, ans=0.0 2024-08-11 13:58:12,466 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 13:58:28,022 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2024-08-11 13:58:35,059 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11700, loss[loss=0.1362, beats_loss=0.009256, ecapa_loss=0.0001891, whisper_loss=0.125, over 23255.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01136, ecapa_loss=0.0001965, whisper_loss=0.09368, over 3926757.26 frames. ], batch size: 90, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:58:35,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1131470.0, ans=0.2 2024-08-11 13:58:49,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1131570.0, ans=0.125 2024-08-11 13:58:51,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1131570.0, ans=0.2 2024-08-11 13:58:56,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1131570.0, ans=0.07 2024-08-11 13:59:06,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1131670.0, ans=0.1 2024-08-11 13:59:10,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1131670.0, ans=0.07 2024-08-11 13:59:11,095 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 13:59:11,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1131670.0, ans=0.125 2024-08-11 13:59:13,396 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.01 vs. limit=6.0 2024-08-11 13:59:37,339 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.19 vs. limit=6.0 2024-08-11 13:59:49,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1131870.0, ans=0.1 2024-08-11 13:59:56,820 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11750, loss[loss=0.1027, beats_loss=0.009889, ecapa_loss=0.0002283, whisper_loss=0.09057, over 21610.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01143, ecapa_loss=0.0001968, whisper_loss=0.09333, over 3928187.99 frames. ], batch size: 91, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:00:04,947 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.320e+01 2.835e+01 3.323e+01 3.805e+01 1.328e+02, threshold=6.647e+01, percent-clipped=1.0 2024-08-11 14:00:07,949 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-11 14:00:15,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1132070.0, ans=0.125 2024-08-11 14:00:21,700 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 14:00:22,661 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.65 vs. limit=15.0 2024-08-11 14:00:50,397 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-11 14:00:59,359 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-11 14:01:15,681 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11800, loss[loss=0.09221, beats_loss=0.01211, ecapa_loss=0.0002172, whisper_loss=0.07793, over 20818.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01144, ecapa_loss=0.000197, whisper_loss=0.09276, over 3919478.98 frames. ], batch size: 87, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:01:22,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1132470.0, ans=0.125 2024-08-11 14:01:32,320 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-11 14:01:32,861 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 14:02:09,715 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 14:02:12,944 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-11 14:02:30,199 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11850, loss[loss=0.1102, beats_loss=0.01237, ecapa_loss=0.0001916, whisper_loss=0.0959, over 22239.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0114, ecapa_loss=0.0001973, whisper_loss=0.09281, over 3920044.60 frames. ], batch size: 90, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:02:30,342 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 14:02:34,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1132970.0, ans=0.1 2024-08-11 14:02:38,080 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.696e+01 3.020e+01 3.645e+01 5.662e+01, threshold=6.041e+01, percent-clipped=0.0 2024-08-11 14:02:44,891 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=15.0 2024-08-11 14:02:45,963 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 14:02:53,534 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 14:03:03,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1133170.0, ans=0.125 2024-08-11 14:03:07,075 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 14:03:10,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1133170.0, ans=0.05 2024-08-11 14:03:12,359 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.11 vs. limit=12.0 2024-08-11 14:03:16,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1133270.0, ans=0.125 2024-08-11 14:03:17,266 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 14:03:18,897 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 20 from LS+wenet, 21 from Vox, 51 fro AS 2024-08-11 14:03:32,915 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 14:03:38,784 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 14:03:43,563 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 17 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 14:03:46,728 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11900, loss[loss=0.08927, beats_loss=0.01366, ecapa_loss=0.0002078, whisper_loss=0.07354, over 21836.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01149, ecapa_loss=0.0001966, whisper_loss=0.0925, over 3916543.10 frames. ], batch size: 95, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:04:14,410 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 14:04:22,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1133670.0, ans=0.2 2024-08-11 14:04:26,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1133670.0, ans=0.2 2024-08-11 14:04:39,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1133770.0, ans=0.125 2024-08-11 14:05:04,727 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 11950, loss[loss=0.09028, beats_loss=0.01285, ecapa_loss=0.0001953, whisper_loss=0.07549, over 17846.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01144, ecapa_loss=0.0001975, whisper_loss=0.09214, over 3894234.01 frames. ], batch size: 74, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:05:10,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1133970.0, ans=0.125 2024-08-11 14:05:12,827 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.574e+01 2.891e+01 3.292e+01 6.091e+01, threshold=5.783e+01, percent-clipped=1.0 2024-08-11 14:05:20,394 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-11 14:05:23,273 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.76 vs. limit=22.5 2024-08-11 14:05:31,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1134070.0, ans=0.2 2024-08-11 14:05:39,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1134170.0, ans=0.0 2024-08-11 14:05:55,231 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 13 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 14:06:10,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1134370.0, ans=0.125 2024-08-11 14:06:24,386 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12000, loss[loss=0.09922, beats_loss=0.01251, ecapa_loss=0.000134, whisper_loss=0.08537, over 15497.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01148, ecapa_loss=0.0001957, whisper_loss=0.09146, over 3868741.06 frames. ], batch size: 57, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:06:24,387 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 14:07:03,222 INFO [train_multi_KD3.py:1149] (2/4) Epoch 8, validation on ASR_libri: loss=0.2578, beats_loss=0, ecapa_loss=0.0006428, whisper_loss=0.2514, over 922467.00 frames. 2024-08-11 14:07:22,403 INFO [train_multi_KD3.py:1149] (2/4) Epoch 8, validation on SV_voxceleb1: loss=0.005208, beats_loss=0, ecapa_loss=0.0005208, whisper_loss=0, over 939242.00 frames. 2024-08-11 14:07:46,543 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.2885, 5.0724, 5.1432, 5.2089], device='cuda:2') 2024-08-11 14:08:10,180 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.8587, 3.1716, 3.6131, 3.5885], device='cuda:2') 2024-08-11 14:09:12,773 INFO [train_multi_KD3.py:1149] (2/4) Epoch 8, validation on AT_audioset: loss=0.02509, beats_loss=0.02509, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 14:09:12,777 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 14:09:35,441 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 14:10:09,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1134770.0, ans=0.0 2024-08-11 14:10:13,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1134870.0, ans=0.1 2024-08-11 14:10:26,310 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12050, loss[loss=0.1049, beats_loss=0.01179, ecapa_loss=0.0001824, whisper_loss=0.09128, over 14796.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01138, ecapa_loss=0.0001966, whisper_loss=0.09225, over 3848753.47 frames. ], batch size: 60, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:10:34,778 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.739e+01 2.961e+01 3.556e+01 5.317e+01, threshold=5.922e+01, percent-clipped=0.0 2024-08-11 14:10:38,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1134970.0, ans=0.04949747468305833 2024-08-11 14:10:41,559 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.11 vs. limit=22.5 2024-08-11 14:11:13,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1135270.0, ans=0.125 2024-08-11 14:11:20,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1135270.0, ans=0.0 2024-08-11 14:11:25,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1135370.0, ans=0.125 2024-08-11 14:11:28,661 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 14:11:30,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1135370.0, ans=0.0 2024-08-11 14:11:30,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1135370.0, ans=0.125 2024-08-11 14:11:37,143 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2024-08-11 14:11:42,335 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12100, loss[loss=0.1052, beats_loss=0.00945, ecapa_loss=0.0002629, whisper_loss=0.09312, over 21380.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01133, ecapa_loss=0.0001983, whisper_loss=0.09232, over 3871035.78 frames. ], batch size: 91, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:11:46,176 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=15.0 2024-08-11 14:12:11,025 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 14:12:19,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1135670.0, ans=0.125 2024-08-11 14:12:36,322 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.08 vs. limit=15.0 2024-08-11 14:12:52,136 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12150, loss[loss=0.1099, beats_loss=0.01119, ecapa_loss=0.0001721, whisper_loss=0.09696, over 19831.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01137, ecapa_loss=0.0001971, whisper_loss=0.09215, over 3857606.92 frames. ], batch size: 76, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:12:59,054 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.505e+01 2.843e+01 3.166e+01 1.229e+02, threshold=5.686e+01, percent-clipped=1.0 2024-08-11 14:13:02,159 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 12 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-11 14:13:13,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1136070.0, ans=0.0 2024-08-11 14:13:18,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1136170.0, ans=0.09899494936611666 2024-08-11 14:13:31,968 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 14:13:41,585 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-08-11 14:13:43,595 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 35 from Vox, 34 fro AS 2024-08-11 14:13:47,653 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-11 14:13:54,952 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 37 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 14:13:55,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1136370.0, ans=0.1 2024-08-11 14:14:00,462 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12200, loss[loss=0.1096, beats_loss=0.009434, ecapa_loss=0.000265, whisper_loss=0.09751, over 16023.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01132, ecapa_loss=0.0001975, whisper_loss=0.09248, over 3853924.04 frames. ], batch size: 68, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:14:05,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1136470.0, ans=0.125 2024-08-11 14:14:16,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1136570.0, ans=0.125 2024-08-11 14:14:19,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1136570.0, ans=0.125 2024-08-11 14:14:30,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1136670.0, ans=0.05 2024-08-11 14:14:44,545 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 14:14:50,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=1136770.0, ans=0.2 2024-08-11 14:14:50,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1136770.0, ans=0.025 2024-08-11 14:14:57,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1136870.0, ans=0.0 2024-08-11 14:15:01,300 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.053e-02 2024-08-11 14:15:05,353 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 26 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-11 14:15:05,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1136870.0, ans=0.0 2024-08-11 14:15:09,558 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12250, loss[loss=0.09847, beats_loss=0.01556, ecapa_loss=0.0001627, whisper_loss=0.08128, over 16935.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01125, ecapa_loss=0.000197, whisper_loss=0.09306, over 3856731.12 frames. ], batch size: 65, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:15:14,108 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.993e+02 2024-08-11 14:15:15,212 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 14:15:16,446 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.710e+01 3.098e+01 3.529e+01 5.582e+01, threshold=6.197e+01, percent-clipped=0.0 2024-08-11 14:15:26,361 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 14:15:40,023 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 14:16:13,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1137370.0, ans=0.125 2024-08-11 14:16:16,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1137370.0, ans=0.125 2024-08-11 14:16:19,032 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12300, loss[loss=0.1316, beats_loss=0.01062, ecapa_loss=0.0002183, whisper_loss=0.1188, over 23330.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01126, ecapa_loss=0.0001974, whisper_loss=0.09322, over 3900696.88 frames. ], batch size: 92, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:16:20,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1137470.0, ans=0.125 2024-08-11 14:16:27,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1137470.0, ans=0.0 2024-08-11 14:16:32,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1137570.0, ans=0.125 2024-08-11 14:16:34,109 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 14:16:41,157 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 14:16:51,534 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 14:17:06,050 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.91 vs. limit=15.0 2024-08-11 14:17:08,504 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 14:17:17,776 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-11 14:17:18,273 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.77 vs. limit=15.0 2024-08-11 14:17:21,471 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.99 vs. limit=22.5 2024-08-11 14:17:24,730 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 14:17:26,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1137870.0, ans=0.125 2024-08-11 14:17:28,838 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12350, loss[loss=0.1071, beats_loss=0.01073, ecapa_loss=0.0002112, whisper_loss=0.09426, over 22243.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.0113, ecapa_loss=0.0001979, whisper_loss=0.09365, over 3946371.58 frames. ], batch size: 92, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:17:35,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1137970.0, ans=0.2 2024-08-11 14:17:36,213 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.078e+01 2.771e+01 3.079e+01 3.408e+01 5.279e+01, threshold=6.158e+01, percent-clipped=0.0 2024-08-11 14:17:44,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1138070.0, ans=0.125 2024-08-11 14:17:51,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1138070.0, ans=0.125 2024-08-11 14:17:52,491 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 36 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-11 14:17:57,093 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.29 vs. limit=22.5 2024-08-11 14:18:07,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1138170.0, ans=0.2 2024-08-11 14:18:41,595 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12400, loss[loss=0.1097, beats_loss=0.01034, ecapa_loss=0.0001999, whisper_loss=0.09741, over 21850.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01129, ecapa_loss=0.0001966, whisper_loss=0.09394, over 3950750.22 frames. ], batch size: 88, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:18:44,567 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 14:18:51,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1138470.0, ans=0.125 2024-08-11 14:19:01,419 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2024-08-11 14:19:04,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1138570.0, ans=0.125 2024-08-11 14:19:06,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1138570.0, ans=0.1 2024-08-11 14:19:14,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1138670.0, ans=0.125 2024-08-11 14:19:23,984 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 14:19:31,887 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.48 vs. limit=10.0 2024-08-11 14:19:35,165 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 14:19:52,005 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12450, loss[loss=0.1017, beats_loss=0.01126, ecapa_loss=0.0001449, whisper_loss=0.08896, over 14619.00 frames. ], tot_loss[loss=0.107, beats_loss=0.0113, ecapa_loss=0.0001967, whisper_loss=0.09376, over 3944500.40 frames. ], batch size: 56, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:19:54,361 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.76 vs. limit=15.0 2024-08-11 14:19:59,838 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.783e+01 3.134e+01 3.561e+01 9.376e+01, threshold=6.268e+01, percent-clipped=1.0 2024-08-11 14:20:33,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1139270.0, ans=0.125 2024-08-11 14:20:51,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1139370.0, ans=22.5 2024-08-11 14:20:53,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1139370.0, ans=0.04949747468305833 2024-08-11 14:21:02,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1139370.0, ans=0.2 2024-08-11 14:21:04,883 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12500, loss[loss=0.1171, beats_loss=0.01167, ecapa_loss=0.0001967, whisper_loss=0.1035, over 22641.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0113, ecapa_loss=0.0001964, whisper_loss=0.09323, over 3903746.47 frames. ], batch size: 91, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:21:04,998 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 14:21:20,889 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 14:21:23,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1139570.0, ans=10.0 2024-08-11 14:21:44,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1139670.0, ans=0.2 2024-08-11 14:21:46,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1139670.0, ans=0.1 2024-08-11 14:21:54,611 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 14:22:01,344 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 14:22:04,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1139870.0, ans=0.1 2024-08-11 14:22:07,854 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.71 vs. limit=22.5 2024-08-11 14:22:15,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1139870.0, ans=0.04949747468305833 2024-08-11 14:22:20,513 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12550, loss[loss=0.1018, beats_loss=0.01062, ecapa_loss=0.0002383, whisper_loss=0.08884, over 20042.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01128, ecapa_loss=0.0001956, whisper_loss=0.09323, over 3905565.42 frames. ], batch size: 86, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:22:22,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1139970.0, ans=0.1 2024-08-11 14:22:27,655 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.780e+01 3.157e+01 3.733e+01 7.024e+01, threshold=6.315e+01, percent-clipped=2.0 2024-08-11 14:22:33,461 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 14:22:41,991 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-11 14:22:44,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.05 vs. limit=22.5 2024-08-11 14:22:55,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1140170.0, ans=0.125 2024-08-11 14:23:05,471 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 14:23:07,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1140270.0, ans=0.05 2024-08-11 14:23:07,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1140270.0, ans=0.125 2024-08-11 14:23:10,467 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 17 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-11 14:23:19,144 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 37 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 14:23:26,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.21 vs. limit=15.0 2024-08-11 14:23:27,464 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 18 from LS+wenet, 32 from Vox, 41 fro AS 2024-08-11 14:23:34,298 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12600, loss[loss=0.08381, beats_loss=0.01281, ecapa_loss=0.0001875, whisper_loss=0.06912, over 15447.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01133, ecapa_loss=0.0001968, whisper_loss=0.09342, over 3905464.79 frames. ], batch size: 62, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:23:42,028 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 14:23:44,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1140470.0, ans=0.0 2024-08-11 14:23:57,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1140570.0, ans=0.0 2024-08-11 14:24:19,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1140770.0, ans=0.125 2024-08-11 14:24:22,439 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 14:24:30,239 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 19 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-11 14:24:48,523 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12650, loss[loss=0.1102, beats_loss=0.009716, ecapa_loss=0.0002282, whisper_loss=0.09819, over 21945.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01135, ecapa_loss=0.0001969, whisper_loss=0.09296, over 3883768.49 frames. ], batch size: 88, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:24:55,233 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.818e+01 3.225e+01 3.809e+01 6.974e+01, threshold=6.451e+01, percent-clipped=1.0 2024-08-11 14:24:55,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2024-08-11 14:24:59,264 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 19 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-11 14:25:15,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1141070.0, ans=0.125 2024-08-11 14:25:24,494 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2024-08-11 14:25:26,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1141170.0, ans=0.125 2024-08-11 14:25:34,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1141270.0, ans=0.2 2024-08-11 14:25:40,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1141270.0, ans=0.125 2024-08-11 14:25:46,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1141370.0, ans=0.1 2024-08-11 14:25:47,378 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 14:25:51,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1141370.0, ans=0.1 2024-08-11 14:26:00,985 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12700, loss[loss=0.1022, beats_loss=0.01127, ecapa_loss=0.0002619, whisper_loss=0.08832, over 21352.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01131, ecapa_loss=0.0001983, whisper_loss=0.09292, over 3851227.35 frames. ], batch size: 91, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:26:11,811 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 14:26:21,559 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 14:26:29,476 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.58 vs. limit=15.0 2024-08-11 14:26:31,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1141670.0, ans=0.125 2024-08-11 14:26:39,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1141670.0, ans=0.125 2024-08-11 14:26:46,375 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-11 14:26:55,686 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-11 14:26:57,512 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.00 vs. limit=15.0 2024-08-11 14:27:01,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1141870.0, ans=0.0 2024-08-11 14:27:01,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1141870.0, ans=0.04949747468305833 2024-08-11 14:27:09,378 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 14:27:10,619 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12750, loss[loss=0.1069, beats_loss=0.01043, ecapa_loss=0.0002021, whisper_loss=0.0944, over 22582.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01128, ecapa_loss=0.0001991, whisper_loss=0.09335, over 3865785.44 frames. ], batch size: 92, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:27:14,050 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 14:27:14,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1141970.0, ans=0.125 2024-08-11 14:27:17,375 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.164e+01 2.661e+01 2.986e+01 3.443e+01 7.051e+01, threshold=5.972e+01, percent-clipped=1.0 2024-08-11 14:27:29,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1142070.0, ans=0.1 2024-08-11 14:27:37,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1142170.0, ans=0.1 2024-08-11 14:27:37,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1142170.0, ans=0.125 2024-08-11 14:27:42,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1142170.0, ans=0.125 2024-08-11 14:27:47,603 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 14:27:51,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1142270.0, ans=0.1 2024-08-11 14:27:54,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1142270.0, ans=0.2 2024-08-11 14:27:58,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1142270.0, ans=0.2 2024-08-11 14:28:00,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1142270.0, ans=0.0 2024-08-11 14:28:07,327 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.34 vs. limit=22.5 2024-08-11 14:28:20,515 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12800, loss[loss=0.1065, beats_loss=0.0135, ecapa_loss=0.0002033, whisper_loss=0.091, over 20785.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01144, ecapa_loss=0.0001974, whisper_loss=0.09302, over 3889295.54 frames. ], batch size: 87, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:28:33,785 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2024-08-11 14:28:43,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1142570.0, ans=0.1 2024-08-11 14:29:01,322 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.74 vs. limit=22.5 2024-08-11 14:29:31,891 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12850, loss[loss=0.09141, beats_loss=0.01297, ecapa_loss=0.0002198, whisper_loss=0.07624, over 14201.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01145, ecapa_loss=0.0001978, whisper_loss=0.09255, over 3874552.18 frames. ], batch size: 58, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:29:32,099 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 14:29:38,556 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.119e+01 2.679e+01 2.923e+01 3.402e+01 6.033e+01, threshold=5.846e+01, percent-clipped=2.0 2024-08-11 14:30:22,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1143270.0, ans=0.125 2024-08-11 14:30:29,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1143370.0, ans=0.125 2024-08-11 14:30:40,470 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.64 vs. limit=15.0 2024-08-11 14:30:40,840 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12900, loss[loss=0.1173, beats_loss=0.01262, ecapa_loss=0.0001668, whisper_loss=0.103, over 23630.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01143, ecapa_loss=0.0001995, whisper_loss=0.09284, over 3882000.37 frames. ], batch size: 90, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:30:42,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1143470.0, ans=0.035 2024-08-11 14:30:49,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1143470.0, ans=0.125 2024-08-11 14:30:53,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1143570.0, ans=0.0 2024-08-11 14:30:54,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1143570.0, ans=0.1 2024-08-11 14:31:22,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1143770.0, ans=0.0 2024-08-11 14:31:26,114 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=15.0 2024-08-11 14:31:28,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1143770.0, ans=0.0 2024-08-11 14:31:39,079 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 14:31:45,708 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 23 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-11 14:31:48,350 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 12950, loss[loss=0.1019, beats_loss=0.01093, ecapa_loss=0.0002314, whisper_loss=0.08867, over 19626.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01131, ecapa_loss=0.0002002, whisper_loss=0.09258, over 3873721.99 frames. ], batch size: 84, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:31:54,990 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.619e+01 2.896e+01 3.261e+01 4.562e+01, threshold=5.792e+01, percent-clipped=0.0 2024-08-11 14:32:00,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1144070.0, ans=0.07 2024-08-11 14:32:18,023 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-08-11 14:32:28,466 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=9.204e-01 2024-08-11 14:32:31,253 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.54 vs. limit=22.5 2024-08-11 14:32:36,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1144270.0, ans=0.0 2024-08-11 14:32:42,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1144370.0, ans=0.125 2024-08-11 14:32:51,020 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.57 vs. limit=6.0 2024-08-11 14:32:55,031 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13000, loss[loss=0.1014, beats_loss=0.01281, ecapa_loss=0.0001744, whisper_loss=0.0868, over 22955.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0113, ecapa_loss=0.0001986, whisper_loss=0.0929, over 3827315.24 frames. ], batch size: 91, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:32:55,161 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 14:33:38,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.04 vs. limit=15.0 2024-08-11 14:33:41,698 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 14:33:42,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1144770.0, ans=0.0 2024-08-11 14:33:44,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1144770.0, ans=0.0 2024-08-11 14:33:54,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1144870.0, ans=0.1 2024-08-11 14:34:01,644 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13050, loss[loss=0.1303, beats_loss=0.01017, ecapa_loss=0.0001684, whisper_loss=0.1185, over 22399.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01128, ecapa_loss=0.0001969, whisper_loss=0.09331, over 3838813.30 frames. ], batch size: 87, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:34:03,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1144970.0, ans=0.09899494936611666 2024-08-11 14:34:06,617 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 14:34:09,118 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.257e+01 2.663e+01 3.009e+01 3.543e+01 5.736e+01, threshold=6.018e+01, percent-clipped=0.0 2024-08-11 14:34:09,314 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 14:34:10,616 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-11 14:34:24,466 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 14:35:03,618 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 14:35:05,425 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.87 vs. limit=6.0 2024-08-11 14:35:08,574 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13100, loss[loss=0.1024, beats_loss=0.008971, ecapa_loss=0.0002778, whisper_loss=0.09067, over 14646.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01131, ecapa_loss=0.0001973, whisper_loss=0.09301, over 3821137.85 frames. ], batch size: 59, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:36:07,168 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 14:36:16,080 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13150, loss[loss=0.1267, beats_loss=0.01128, ecapa_loss=0.0002048, whisper_loss=0.1134, over 17955.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01126, ecapa_loss=0.0001957, whisper_loss=0.0933, over 3789719.81 frames. ], batch size: 68, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:36:24,469 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.646e+01 3.074e+01 3.551e+01 7.415e+01, threshold=6.148e+01, percent-clipped=1.0 2024-08-11 14:36:50,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1146170.0, ans=0.0 2024-08-11 14:36:55,874 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 10 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 14:37:17,103 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 14:37:25,073 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13200, loss[loss=0.1244, beats_loss=0.009592, ecapa_loss=0.0002145, whisper_loss=0.1127, over 22712.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01128, ecapa_loss=0.0001962, whisper_loss=0.09267, over 3769626.49 frames. ], batch size: 89, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:37:32,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1146470.0, ans=0.1 2024-08-11 14:37:46,771 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 14:37:52,899 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.87 vs. limit=10.0 2024-08-11 14:37:53,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1146670.0, ans=0.125 2024-08-11 14:37:55,643 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.52 vs. limit=15.0 2024-08-11 14:38:05,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1146770.0, ans=0.125 2024-08-11 14:38:07,632 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2024-08-11 14:38:16,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1146770.0, ans=0.0 2024-08-11 14:38:21,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1146870.0, ans=0.125 2024-08-11 14:38:29,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1146870.0, ans=0.0 2024-08-11 14:38:30,457 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 14:38:31,558 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13250, loss[loss=0.09908, beats_loss=0.01072, ecapa_loss=0.0002313, whisper_loss=0.08605, over 22003.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01125, ecapa_loss=0.0001983, whisper_loss=0.09266, over 3783258.90 frames. ], batch size: 93, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:38:31,741 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 14:38:39,875 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+01 2.719e+01 3.002e+01 3.497e+01 5.724e+01, threshold=6.004e+01, percent-clipped=0.0 2024-08-11 14:38:41,373 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 14:38:51,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1147070.0, ans=0.2 2024-08-11 14:39:00,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1147170.0, ans=0.125 2024-08-11 14:39:04,374 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.88 vs. limit=22.5 2024-08-11 14:39:06,781 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 16 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 14:39:08,157 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 14:39:12,366 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 17 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-11 14:39:13,496 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 29 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 14:39:38,732 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13300, loss[loss=0.09797, beats_loss=0.01161, ecapa_loss=0.0001718, whisper_loss=0.08464, over 21969.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0112, ecapa_loss=0.0001986, whisper_loss=0.09286, over 3820436.08 frames. ], batch size: 87, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:40:13,478 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 14:40:16,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-11 14:40:22,384 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 14:40:22,794 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.32 vs. limit=15.0 2024-08-11 14:40:23,668 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 14:40:29,191 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 24 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 14:40:32,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1147870.0, ans=0.2 2024-08-11 14:40:44,586 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13350, loss[loss=0.08207, beats_loss=0.01658, ecapa_loss=0.0001653, whisper_loss=0.06383, over 14115.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01121, ecapa_loss=0.0001967, whisper_loss=0.0928, over 3807070.70 frames. ], batch size: 59, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:40:48,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1147970.0, ans=0.2 2024-08-11 14:40:53,101 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.881e+01 3.191e+01 3.673e+01 5.435e+01, threshold=6.381e+01, percent-clipped=0.0 2024-08-11 14:41:03,486 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.02 vs. limit=6.0 2024-08-11 14:41:08,782 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-11 14:41:11,210 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 14:41:11,677 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=15.0 2024-08-11 14:41:17,428 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.82 vs. limit=15.0 2024-08-11 14:41:20,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1148170.0, ans=0.125 2024-08-11 14:41:34,089 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 14:41:37,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1148270.0, ans=0.125 2024-08-11 14:41:52,696 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13400, loss[loss=0.08973, beats_loss=0.01228, ecapa_loss=0.0001834, whisper_loss=0.07562, over 22637.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01119, ecapa_loss=0.0001992, whisper_loss=0.09238, over 3801509.51 frames. ], batch size: 90, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:41:54,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1148470.0, ans=0.0 2024-08-11 14:41:59,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1148470.0, ans=0.125 2024-08-11 14:42:08,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1148570.0, ans=0.125 2024-08-11 14:42:14,073 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 14:42:16,293 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.01 vs. limit=10.0 2024-08-11 14:42:36,882 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 14:42:52,740 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.467e-01 2024-08-11 14:42:55,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1148870.0, ans=0.2 2024-08-11 14:42:59,174 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13450, loss[loss=0.08895, beats_loss=0.01407, ecapa_loss=0.0001766, whisper_loss=0.07312, over 22225.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01131, ecapa_loss=0.0001978, whisper_loss=0.09225, over 3853328.52 frames. ], batch size: 94, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:43:00,706 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 14:43:01,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1148970.0, ans=0.125 2024-08-11 14:43:02,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1148970.0, ans=0.125 2024-08-11 14:43:07,005 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.672e+01 2.998e+01 3.496e+01 5.811e+01, threshold=5.997e+01, percent-clipped=0.0 2024-08-11 14:43:11,691 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.99 vs. limit=22.5 2024-08-11 14:43:14,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1149070.0, ans=0.125 2024-08-11 14:43:34,140 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 14:43:44,761 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-11 14:43:56,263 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 14:44:01,953 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 14:44:06,849 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13500, loss[loss=0.07945, beats_loss=0.009464, ecapa_loss=0.0001977, whisper_loss=0.06801, over 16880.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01117, ecapa_loss=0.0001992, whisper_loss=0.09298, over 3866227.64 frames. ], batch size: 64, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:44:11,133 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-11 14:44:19,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1149570.0, ans=0.125 2024-08-11 14:44:27,001 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.96 vs. limit=15.0 2024-08-11 14:44:37,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1149670.0, ans=0.0 2024-08-11 14:44:42,160 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 14:44:46,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1149770.0, ans=0.1 2024-08-11 14:44:46,467 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2024-08-11 14:44:50,183 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.58 vs. limit=15.0 2024-08-11 14:44:58,972 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 14:45:08,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1149870.0, ans=0.025 2024-08-11 14:45:13,748 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13550, loss[loss=0.1137, beats_loss=0.01179, ecapa_loss=0.000214, whisper_loss=0.09976, over 20020.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01124, ecapa_loss=0.0001987, whisper_loss=0.09304, over 3891929.06 frames. ], batch size: 82, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:45:22,031 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.724e+01 3.026e+01 3.356e+01 6.368e+01, threshold=6.052e+01, percent-clipped=1.0 2024-08-11 14:45:23,002 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=12.0 2024-08-11 14:45:26,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1150070.0, ans=0.125 2024-08-11 14:45:27,371 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=15.02 vs. limit=15.0 2024-08-11 14:45:29,224 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 17 from LS+wenet, 24 from Vox, 50 fro AS 2024-08-11 14:45:29,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1150070.0, ans=0.125 2024-08-11 14:45:40,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1150170.0, ans=0.1 2024-08-11 14:45:41,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1150170.0, ans=0.125 2024-08-11 14:45:46,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1150170.0, ans=0.2 2024-08-11 14:46:04,820 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 14:46:06,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1150370.0, ans=0.1 2024-08-11 14:46:20,795 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13600, loss[loss=0.1032, beats_loss=0.01144, ecapa_loss=0.0002355, whisper_loss=0.08942, over 19366.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01129, ecapa_loss=0.0001971, whisper_loss=0.09288, over 3888652.44 frames. ], batch size: 84, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:46:21,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1150470.0, ans=0.125 2024-08-11 14:46:55,897 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 14:46:59,785 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 14:47:08,697 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 14:47:10,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1150770.0, ans=0.125 2024-08-11 14:47:13,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1150870.0, ans=0.125 2024-08-11 14:47:15,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1150870.0, ans=0.0 2024-08-11 14:47:27,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13650, loss[loss=0.1041, beats_loss=0.01415, ecapa_loss=0.0001901, whisper_loss=0.088, over 18864.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01133, ecapa_loss=0.0001967, whisper_loss=0.09279, over 3853320.96 frames. ], batch size: 76, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:47:34,897 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.222e+01 2.952e+01 3.395e+01 3.813e+01 5.359e+01, threshold=6.790e+01, percent-clipped=0.0 2024-08-11 14:47:37,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1150970.0, ans=0.125 2024-08-11 14:47:49,661 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 14:47:54,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1151170.0, ans=0.04949747468305833 2024-08-11 14:48:14,618 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-11 14:48:34,084 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13700, loss[loss=0.06902, beats_loss=0.00991, ecapa_loss=0.0001772, whisper_loss=0.05734, over 16357.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0113, ecapa_loss=0.0001967, whisper_loss=0.09326, over 3854538.76 frames. ], batch size: 64, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:48:35,898 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 27 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-11 14:48:44,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1151470.0, ans=0.2 2024-08-11 14:48:48,273 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 32 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-11 14:49:04,002 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 14:49:05,957 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.97 vs. limit=22.5 2024-08-11 14:49:37,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1151870.0, ans=0.125 2024-08-11 14:49:41,223 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13750, loss[loss=0.09442, beats_loss=0.01133, ecapa_loss=0.0001801, whisper_loss=0.08128, over 21261.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01135, ecapa_loss=0.0001973, whisper_loss=0.09284, over 3871168.41 frames. ], batch size: 82, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:49:49,570 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.564e+01 2.884e+01 3.394e+01 1.263e+02, threshold=5.769e+01, percent-clipped=1.0 2024-08-11 14:50:38,201 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-11 14:50:39,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1152370.0, ans=0.125 2024-08-11 14:50:48,442 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13800, loss[loss=0.1164, beats_loss=0.00842, ecapa_loss=0.0001973, whisper_loss=0.106, over 16594.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01135, ecapa_loss=0.0001962, whisper_loss=0.09315, over 3890454.57 frames. ], batch size: 63, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:50:50,644 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2024-08-11 14:50:52,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1152470.0, ans=0.2 2024-08-11 14:51:02,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1152570.0, ans=0.0 2024-08-11 14:51:31,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1152770.0, ans=0.125 2024-08-11 14:51:38,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1152770.0, ans=0.125 2024-08-11 14:51:43,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1152870.0, ans=0.125 2024-08-11 14:51:55,138 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13850, loss[loss=0.09114, beats_loss=0.01058, ecapa_loss=0.0001373, whisper_loss=0.07919, over 18510.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01137, ecapa_loss=0.000195, whisper_loss=0.09305, over 3906073.97 frames. ], batch size: 67, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:52:02,144 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 14:52:03,066 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.659e+01 3.124e+01 3.574e+01 6.862e+01, threshold=6.248e+01, percent-clipped=1.0 2024-08-11 14:52:04,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1152970.0, ans=0.0 2024-08-11 14:52:08,900 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-08-11 14:52:21,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1153170.0, ans=0.125 2024-08-11 14:52:22,574 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.69 vs. limit=10.0 2024-08-11 14:52:39,292 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.20 vs. limit=15.0 2024-08-11 14:52:51,040 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 14:53:01,587 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13900, loss[loss=0.1176, beats_loss=0.00762, ecapa_loss=0.00018, whisper_loss=0.1082, over 16786.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01134, ecapa_loss=0.0001937, whisper_loss=0.09325, over 3870589.98 frames. ], batch size: 59, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:53:06,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1153470.0, ans=0.125 2024-08-11 14:53:09,083 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2024-08-11 14:53:17,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1153570.0, ans=0.125 2024-08-11 14:53:26,842 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 14:53:32,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1153670.0, ans=0.125 2024-08-11 14:53:43,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1153770.0, ans=0.0 2024-08-11 14:53:52,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1153770.0, ans=0.0 2024-08-11 14:53:53,358 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 14:53:59,738 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 17 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 14:54:07,691 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 13950, loss[loss=0.09136, beats_loss=0.01132, ecapa_loss=0.000171, whisper_loss=0.07834, over 14670.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01132, ecapa_loss=0.000195, whisper_loss=0.09292, over 3871528.19 frames. ], batch size: 57, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:54:11,914 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 14:54:15,667 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.781e+01 3.096e+01 3.577e+01 5.485e+01, threshold=6.193e+01, percent-clipped=0.0 2024-08-11 14:54:28,118 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 14:54:48,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1154270.0, ans=0.1 2024-08-11 14:54:51,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1154270.0, ans=0.2 2024-08-11 14:54:56,047 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=15.0 2024-08-11 14:54:57,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1154270.0, ans=0.0 2024-08-11 14:55:01,506 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2024-08-11 14:55:14,359 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.88 vs. limit=22.5 2024-08-11 14:55:16,112 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 14000, loss[loss=0.09823, beats_loss=0.01109, ecapa_loss=0.0001848, whisper_loss=0.08529, over 17923.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01128, ecapa_loss=0.0001953, whisper_loss=0.09303, over 3866824.74 frames. ], batch size: 72, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:55:41,301 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 14:55:56,845 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2024-08-11 14:56:01,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1154770.0, ans=0.0 2024-08-11 14:56:22,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1154870.0, ans=10.0 2024-08-11 14:56:24,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1154870.0, ans=0.125 2024-08-11 14:56:27,403 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 14050, loss[loss=0.1049, beats_loss=0.01141, ecapa_loss=0.0002073, whisper_loss=0.0914, over 21988.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.0113, ecapa_loss=0.0001949, whisper_loss=0.09341, over 3872942.92 frames. ], batch size: 89, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:56:36,823 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.747e+01 3.034e+01 3.556e+01 6.486e+01, threshold=6.067e+01, percent-clipped=1.0 2024-08-11 14:56:40,570 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 14:57:11,639 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 14:57:25,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1155270.0, ans=0.125 2024-08-11 14:57:27,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1155370.0, ans=0.125 2024-08-11 14:57:40,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1155370.0, ans=0.0 2024-08-11 14:57:41,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1155370.0, ans=0.0 2024-08-11 14:57:43,619 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 14100, loss[loss=0.1261, beats_loss=0.009881, ecapa_loss=0.0001889, whisper_loss=0.1143, over 23547.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01126, ecapa_loss=0.000195, whisper_loss=0.09405, over 3880018.27 frames. ], batch size: 90, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:57:49,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1155470.0, ans=0.125 2024-08-11 14:57:49,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1155470.0, ans=0.125 2024-08-11 14:57:50,202 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 14:57:51,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1155470.0, ans=0.04949747468305833 2024-08-11 14:58:01,543 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-11 14:58:05,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1155570.0, ans=0.125 2024-08-11 14:58:06,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1155570.0, ans=0.0 2024-08-11 14:58:11,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1155570.0, ans=0.125 2024-08-11 14:58:14,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1155670.0, ans=0.125 2024-08-11 14:58:26,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1155670.0, ans=0.125 2024-08-11 14:58:32,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1155770.0, ans=0.125 2024-08-11 14:58:41,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1155770.0, ans=0.125 2024-08-11 14:58:54,254 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 14:58:56,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1155870.0, ans=0.125 2024-08-11 14:58:59,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 14150, loss[loss=0.08437, beats_loss=0.01406, ecapa_loss=0.0001741, whisper_loss=0.06858, over 16311.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01137, ecapa_loss=0.0001927, whisper_loss=0.09385, over 3891293.45 frames. ], batch size: 65, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:59:04,062 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 30 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 14:59:08,701 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.682e+01 3.045e+01 3.525e+01 6.405e+01, threshold=6.090e+01, percent-clipped=1.0 2024-08-11 14:59:10,741 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.76 vs. limit=22.5 2024-08-11 14:59:24,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=1156070.0, ans=15.0 2024-08-11 15:00:08,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1156370.0, ans=0.1 2024-08-11 15:00:14,880 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.33 vs. limit=22.5 2024-08-11 15:00:17,356 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 14200, loss[loss=0.09416, beats_loss=0.009937, ecapa_loss=0.0003064, whisper_loss=0.08116, over 20717.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01129, ecapa_loss=0.0001931, whisper_loss=0.09426, over 3918919.86 frames. ], batch size: 92, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:00:17,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1156470.0, ans=0.2 2024-08-11 15:00:34,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1156570.0, ans=0.125 2024-08-11 15:00:58,659 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 35 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 15:01:04,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1156770.0, ans=0.125 2024-08-11 15:01:16,659 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 15:01:30,467 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2024-08-11 15:01:32,801 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 14250, loss[loss=0.09843, beats_loss=0.01049, ecapa_loss=0.0002231, whisper_loss=0.0857, over 21718.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01123, ecapa_loss=0.0001959, whisper_loss=0.09421, over 3907825.85 frames. ], batch size: 89, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:01:43,339 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.820e+01 3.214e+01 3.813e+01 8.671e+01, threshold=6.428e+01, percent-clipped=3.0 2024-08-11 15:02:06,987 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-11 15:02:07,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1157170.0, ans=0.0 2024-08-11 15:02:09,771 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.76 vs. limit=22.5 2024-08-11 15:02:26,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1157270.0, ans=0.2 2024-08-11 15:02:27,401 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.38 vs. limit=12.0 2024-08-11 15:02:30,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1157270.0, ans=0.2 2024-08-11 15:02:30,279 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.86 vs. limit=10.0 2024-08-11 15:02:52,818 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 14300, loss[loss=0.1332, beats_loss=0.009455, ecapa_loss=0.0002079, whisper_loss=0.1216, over 18322.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01128, ecapa_loss=0.0001962, whisper_loss=0.09398, over 3881897.62 frames. ], batch size: 72, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:03:04,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.67 vs. limit=10.0 2024-08-11 15:03:04,918 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 15:03:08,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1157570.0, ans=0.1 2024-08-11 15:03:09,011 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.770e+00 2024-08-11 15:03:15,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1157570.0, ans=0.025 2024-08-11 15:03:17,745 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.44 vs. limit=10.0 2024-08-11 15:03:23,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1157670.0, ans=0.125 2024-08-11 15:03:29,311 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.61 vs. limit=22.5 2024-08-11 15:03:32,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1157670.0, ans=0.125 2024-08-11 15:03:36,584 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2024-08-11 15:03:50,856 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=25.07 vs. limit=15.0 2024-08-11 15:03:53,311 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 15:03:53,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=1157870.0, ans=0.02 2024-08-11 15:03:58,869 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-11 15:03:59,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1157870.0, ans=0.2 2024-08-11 15:04:07,913 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 14350, loss[loss=0.1173, beats_loss=0.01315, ecapa_loss=0.0001621, whisper_loss=0.1025, over 18895.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.0113, ecapa_loss=0.000195, whisper_loss=0.09345, over 3875987.91 frames. ], batch size: 72, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:04:08,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1157970.0, ans=0.0 2024-08-11 15:04:12,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1157970.0, ans=0.1 2024-08-11 15:04:16,458 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.155e+01 2.896e+01 3.266e+01 3.801e+01 1.000e+02, threshold=6.532e+01, percent-clipped=2.0 2024-08-11 15:04:19,877 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 15:04:29,281 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2024-08-11 15:04:29,954 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 15:04:41,845 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-11 15:04:52,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=1158270.0, ans=15.0 2024-08-11 15:05:02,064 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 15:05:06,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1158370.0, ans=0.125 2024-08-11 15:05:08,308 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.76 vs. limit=15.0 2024-08-11 15:05:19,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1158370.0, ans=0.0 2024-08-11 15:05:21,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1158370.0, ans=0.2 2024-08-11 15:05:23,710 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 14400, loss[loss=0.1143, beats_loss=0.01241, ecapa_loss=0.0001913, whisper_loss=0.09994, over 23048.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01137, ecapa_loss=0.0001954, whisper_loss=0.09259, over 3921640.00 frames. ], batch size: 94, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:05:30,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1158470.0, ans=0.0 2024-08-11 15:05:41,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1158570.0, ans=0.125 2024-08-11 15:05:41,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1158570.0, ans=0.125 2024-08-11 15:05:41,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1158570.0, ans=0.125 2024-08-11 15:05:42,061 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=12.0 2024-08-11 15:05:46,854 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-08-11 15:06:00,930 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-11 15:06:02,550 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 34 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 15:06:05,404 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 15:06:14,422 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 15:06:32,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1158870.0, ans=0.035 2024-08-11 15:06:36,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1158870.0, ans=0.0 2024-08-11 15:06:39,118 INFO [train_multi_KD3.py:1116] (2/4) Epoch 8, batch 14450, loss[loss=0.0947, beats_loss=0.01404, ecapa_loss=0.0001863, whisper_loss=0.0788, over 21423.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01132, ecapa_loss=0.0001955, whisper_loss=0.09303, over 3910916.97 frames. ], batch size: 90, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:06:39,999 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.50 vs. limit=22.5 2024-08-11 15:06:43,855 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-11 15:06:46,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1158970.0, ans=0.125 2024-08-11 15:06:48,861 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.150e+01 2.733e+01 3.088e+01 3.504e+01 7.570e+01, threshold=6.176e+01, percent-clipped=1.0 2024-08-11 15:06:52,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1158970.0, ans=0.125 2024-08-11 15:07:11,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1159170.0, ans=0.125 2024-08-11 15:07:12,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1159170.0, ans=0.0 2024-08-11 15:07:28,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1159270.0, ans=0.05 2024-08-11 15:07:31,113 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.85 vs. limit=15.0 2024-08-11 15:08:19,722 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 0, loss[loss=0.09144, beats_loss=0.0131, ecapa_loss=0.0001596, whisper_loss=0.07674, over 21124.00 frames. ], tot_loss[loss=0.09144, beats_loss=0.0131, ecapa_loss=0.0001596, whisper_loss=0.07674, over 21124.00 frames. ], batch size: 81, lr: 7.24e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:08:19,723 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 15:08:56,580 INFO [train_multi_KD3.py:1149] (2/4) Epoch 9, validation on ASR_libri: loss=0.2578, beats_loss=0, ecapa_loss=0.0006493, whisper_loss=0.2513, over 922467.00 frames. 2024-08-11 15:09:15,611 INFO [train_multi_KD3.py:1149] (2/4) Epoch 9, validation on SV_voxceleb1: loss=0.005328, beats_loss=0, ecapa_loss=0.0005328, whisper_loss=0, over 939242.00 frames. 2024-08-11 15:11:18,959 INFO [train_multi_KD3.py:1149] (2/4) Epoch 9, validation on AT_audioset: loss=0.0249, beats_loss=0.0249, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 15:11:18,963 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 15:11:25,871 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 15:11:32,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1159380.0, ans=0.0 2024-08-11 15:12:14,550 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 15:12:36,441 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.333e+03 2024-08-11 15:12:41,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1159580.0, ans=0.07 2024-08-11 15:13:14,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=1159680.0, ans=0.1 2024-08-11 15:13:14,556 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=12.0 2024-08-11 15:13:37,478 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2024-08-11 15:14:01,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1159780.0, ans=0.125 2024-08-11 15:14:07,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1159780.0, ans=0.09899494936611666 2024-08-11 15:14:32,820 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 50, loss[loss=0.0998, beats_loss=0.008951, ecapa_loss=0.0002396, whisper_loss=0.08845, over 17697.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01048, ecapa_loss=0.0002069, whisper_loss=0.09445, over 899321.25 frames. ], batch size: 72, lr: 7.24e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:14:42,284 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.60 vs. limit=22.5 2024-08-11 15:15:52,760 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.906e+01 3.207e+01 3.715e+01 5.089e+01, threshold=6.415e+01, percent-clipped=0.0 2024-08-11 15:17:07,018 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 15:17:23,366 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-11 15:17:59,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1160180.0, ans=0.2 2024-08-11 15:18:21,966 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-11 15:18:30,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1160280.0, ans=0.2 2024-08-11 15:18:48,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1160280.0, ans=0.125 2024-08-11 15:18:54,035 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.45 vs. limit=6.0 2024-08-11 15:19:03,943 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 100, loss[loss=0.1184, beats_loss=0.01102, ecapa_loss=0.0002229, whisper_loss=0.1052, over 21959.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.0106, ecapa_loss=0.0002007, whisper_loss=0.09346, over 1547137.78 frames. ], batch size: 89, lr: 7.24e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:19:25,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1160380.0, ans=0.125 2024-08-11 15:19:25,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1160380.0, ans=0.2 2024-08-11 15:20:13,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1160480.0, ans=0.125 2024-08-11 15:20:27,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1160480.0, ans=0.125 2024-08-11 15:20:30,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1160480.0, ans=0.125 2024-08-11 15:20:39,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1160580.0, ans=0.0 2024-08-11 15:20:52,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.09 vs. limit=22.5 2024-08-11 15:21:15,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1160680.0, ans=0.125 2024-08-11 15:21:20,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1160680.0, ans=0.1 2024-08-11 15:21:47,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1160780.0, ans=0.0 2024-08-11 15:22:03,418 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 150, loss[loss=0.09142, beats_loss=0.01198, ecapa_loss=0.0001986, whisper_loss=0.07746, over 22309.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01059, ecapa_loss=0.0001961, whisper_loss=0.09387, over 2058394.48 frames. ], batch size: 92, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:22:11,787 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 15:22:12,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1160880.0, ans=0.0 2024-08-11 15:22:14,205 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 32 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-11 15:22:31,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1160980.0, ans=0.125 2024-08-11 15:22:34,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1160980.0, ans=0.125 2024-08-11 15:22:43,450 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.986e+01 3.190e+01 3.682e+01 6.515e+01, threshold=6.380e+01, percent-clipped=1.0 2024-08-11 15:22:43,603 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-11 15:22:44,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1160980.0, ans=0.2 2024-08-11 15:23:08,539 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 15:23:15,452 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-11 15:23:19,310 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.59 vs. limit=22.5 2024-08-11 15:23:33,359 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 27 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 15:23:33,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1161180.0, ans=0.0 2024-08-11 15:23:48,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1161280.0, ans=0.0 2024-08-11 15:23:49,969 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-11 15:23:52,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1161280.0, ans=0.0 2024-08-11 15:24:02,897 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 200, loss[loss=0.1357, beats_loss=0.008242, ecapa_loss=0.0002237, whisper_loss=0.1252, over 16670.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01067, ecapa_loss=0.0001956, whisper_loss=0.0937, over 2438543.05 frames. ], batch size: 67, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:24:03,019 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 15:24:19,032 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=15.0 2024-08-11 15:24:20,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1161480.0, ans=0.125 2024-08-11 15:24:22,293 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.87 vs. limit=12.0 2024-08-11 15:24:25,423 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.61 vs. limit=22.5 2024-08-11 15:24:27,079 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.85 vs. limit=22.5 2024-08-11 15:24:31,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1161480.0, ans=0.07 2024-08-11 15:24:43,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1161580.0, ans=0.0 2024-08-11 15:24:58,234 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.41 vs. limit=15.0 2024-08-11 15:25:05,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1161680.0, ans=0.2 2024-08-11 15:25:19,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1161780.0, ans=0.1 2024-08-11 15:25:21,067 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-11 15:25:26,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1161780.0, ans=0.125 2024-08-11 15:25:26,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1161780.0, ans=0.0 2024-08-11 15:25:35,127 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-11 15:25:35,828 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 250, loss[loss=0.1263, beats_loss=0.01015, ecapa_loss=0.0001676, whisper_loss=0.1145, over 18878.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01082, ecapa_loss=0.000194, whisper_loss=0.09389, over 2759021.81 frames. ], batch size: 69, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:25:56,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1161980.0, ans=0.125 2024-08-11 15:26:05,207 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.655e+01 2.964e+01 3.308e+01 4.229e+01, threshold=5.928e+01, percent-clipped=0.0 2024-08-11 15:26:06,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1161980.0, ans=0.125 2024-08-11 15:26:17,520 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 15:26:32,775 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 8 from Vox, 29 fro AS 2024-08-11 15:26:34,922 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 15:26:40,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1162180.0, ans=0.125 2024-08-11 15:26:55,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1162180.0, ans=0.2 2024-08-11 15:26:58,488 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-11 15:27:04,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1162280.0, ans=0.0 2024-08-11 15:27:13,309 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 15:27:17,337 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 15:27:21,674 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 300, loss[loss=0.103, beats_loss=0.0121, ecapa_loss=0.0001751, whisper_loss=0.08911, over 23065.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01089, ecapa_loss=0.0001947, whisper_loss=0.09308, over 2938750.57 frames. ], batch size: 93, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:27:27,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=1162380.0, ans=10.0 2024-08-11 15:27:51,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1162480.0, ans=0.0 2024-08-11 15:27:56,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1162580.0, ans=0.125 2024-08-11 15:27:57,668 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 15:28:00,530 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 15:28:02,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1162580.0, ans=0.0 2024-08-11 15:28:04,574 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 25 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 15:28:11,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1162680.0, ans=0.125 2024-08-11 15:28:18,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1162680.0, ans=0.125 2024-08-11 15:28:39,417 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 350, loss[loss=0.1206, beats_loss=0.009526, ecapa_loss=0.0002245, whisper_loss=0.1088, over 23481.00 frames. ], tot_loss[loss=0.106, beats_loss=0.011, ecapa_loss=0.0001937, whisper_loss=0.09302, over 3153380.10 frames. ], batch size: 92, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:29:00,639 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.571e+01 3.026e+01 3.460e+01 5.079e+01, threshold=6.051e+01, percent-clipped=0.0 2024-08-11 15:29:28,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1163180.0, ans=0.2 2024-08-11 15:29:41,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1163280.0, ans=0.1 2024-08-11 15:29:43,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1163280.0, ans=0.125 2024-08-11 15:29:50,814 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 400, loss[loss=0.1082, beats_loss=0.01103, ecapa_loss=0.0001985, whisper_loss=0.09522, over 21191.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01108, ecapa_loss=0.0001928, whisper_loss=0.09247, over 3294808.13 frames. ], batch size: 84, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:29:59,816 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.44 vs. limit=15.0 2024-08-11 15:30:15,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.83 vs. limit=10.0 2024-08-11 15:30:30,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1163580.0, ans=0.2 2024-08-11 15:30:37,289 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 15:30:50,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1163780.0, ans=0.025 2024-08-11 15:31:01,309 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 450, loss[loss=0.1108, beats_loss=0.01015, ecapa_loss=0.0001943, whisper_loss=0.09868, over 22127.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01107, ecapa_loss=0.000193, whisper_loss=0.09236, over 3415363.42 frames. ], batch size: 87, lr: 7.22e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:31:04,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1163880.0, ans=0.125 2024-08-11 15:31:04,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1163880.0, ans=0.125 2024-08-11 15:31:10,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1163880.0, ans=0.125 2024-08-11 15:31:10,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1163880.0, ans=0.125 2024-08-11 15:31:21,151 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.04 vs. limit=15.0 2024-08-11 15:31:22,795 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.636e+01 2.915e+01 3.353e+01 5.482e+01, threshold=5.829e+01, percent-clipped=0.0 2024-08-11 15:31:28,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1163980.0, ans=0.04949747468305833 2024-08-11 15:31:49,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1164180.0, ans=0.1 2024-08-11 15:31:52,149 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 15:31:56,819 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 30 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 15:31:58,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1164280.0, ans=0.2 2024-08-11 15:32:09,814 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=15.0 2024-08-11 15:32:12,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1164280.0, ans=0.0 2024-08-11 15:32:14,351 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 500, loss[loss=0.1085, beats_loss=0.01188, ecapa_loss=0.0001931, whisper_loss=0.09469, over 22098.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01108, ecapa_loss=0.0001924, whisper_loss=0.09232, over 3517153.54 frames. ], batch size: 87, lr: 7.22e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:32:16,060 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 17 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 15:32:29,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1164480.0, ans=0.125 2024-08-11 15:32:55,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1164580.0, ans=0.1 2024-08-11 15:33:05,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1164680.0, ans=0.05 2024-08-11 15:33:25,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1164880.0, ans=0.04949747468305833 2024-08-11 15:33:26,526 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 550, loss[loss=0.1074, beats_loss=0.008789, ecapa_loss=0.0002053, whisper_loss=0.0966, over 23814.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01107, ecapa_loss=0.0001912, whisper_loss=0.09155, over 3590175.75 frames. ], batch size: 94, lr: 7.22e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:33:48,120 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.637e+01 3.008e+01 3.365e+01 4.595e+01, threshold=6.017e+01, percent-clipped=0.0 2024-08-11 15:33:48,359 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 15:33:59,945 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 15:34:11,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1165180.0, ans=0.125 2024-08-11 15:34:16,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1165180.0, ans=0.1 2024-08-11 15:34:17,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1165180.0, ans=0.0 2024-08-11 15:34:21,648 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 15:34:31,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1165280.0, ans=0.1 2024-08-11 15:34:38,389 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 600, loss[loss=0.08057, beats_loss=0.0118, ecapa_loss=0.0001915, whisper_loss=0.06685, over 14505.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01114, ecapa_loss=0.0001902, whisper_loss=0.09108, over 3639883.90 frames. ], batch size: 56, lr: 7.22e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:34:40,679 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.797e+00 2024-08-11 15:34:42,856 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 15:34:55,117 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=15.0 2024-08-11 15:35:13,449 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 33 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 15:35:26,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1165680.0, ans=0.125 2024-08-11 15:35:49,740 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.49 vs. limit=22.5 2024-08-11 15:35:52,061 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 27 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 15:35:53,118 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 650, loss[loss=0.1377, beats_loss=0.009358, ecapa_loss=0.0001933, whisper_loss=0.1264, over 16190.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01112, ecapa_loss=0.0001896, whisper_loss=0.09142, over 3687736.78 frames. ], batch size: 62, lr: 7.22e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:35:56,415 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 15:35:57,980 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 15:36:16,129 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.692e+01 3.015e+01 3.566e+01 6.762e+01, threshold=6.030e+01, percent-clipped=2.0 2024-08-11 15:36:18,359 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 15:36:39,763 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 15:36:41,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1166180.0, ans=0.125 2024-08-11 15:36:58,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1166280.0, ans=0.125 2024-08-11 15:37:12,205 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 15:37:13,551 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 700, loss[loss=0.08117, beats_loss=0.01113, ecapa_loss=0.0001955, whisper_loss=0.06809, over 14568.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01108, ecapa_loss=0.0001916, whisper_loss=0.09178, over 3696800.74 frames. ], batch size: 56, lr: 7.22e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:37:22,242 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-11 15:37:23,491 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 15:37:26,916 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 15:37:29,384 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.98 vs. limit=10.0 2024-08-11 15:37:44,152 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 15:37:49,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1166580.0, ans=10.0 2024-08-11 15:37:53,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1166580.0, ans=0.125 2024-08-11 15:37:56,209 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 15:38:35,766 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 750, loss[loss=0.1124, beats_loss=0.01027, ecapa_loss=0.0001932, whisper_loss=0.1002, over 22393.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01104, ecapa_loss=0.0001921, whisper_loss=0.09218, over 3738645.10 frames. ], batch size: 89, lr: 7.22e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:38:48,448 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2024-08-11 15:38:59,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1166980.0, ans=0.0 2024-08-11 15:39:00,321 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.570e+01 2.889e+01 3.485e+01 5.934e+01, threshold=5.777e+01, percent-clipped=0.0 2024-08-11 15:39:13,345 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.20 vs. limit=15.0 2024-08-11 15:39:26,971 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 15:39:59,684 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.84 vs. limit=22.5 2024-08-11 15:40:00,608 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 800, loss[loss=0.1036, beats_loss=0.01332, ecapa_loss=0.0001547, whisper_loss=0.08876, over 23191.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01111, ecapa_loss=0.0001905, whisper_loss=0.09209, over 3783669.52 frames. ], batch size: 90, lr: 7.21e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:40:40,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1167580.0, ans=0.125 2024-08-11 15:40:51,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1167680.0, ans=0.125 2024-08-11 15:40:57,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1167680.0, ans=0.125 2024-08-11 15:41:14,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1167780.0, ans=0.0 2024-08-11 15:41:15,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1167780.0, ans=0.125 2024-08-11 15:41:25,098 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 850, loss[loss=0.1162, beats_loss=0.009292, ecapa_loss=0.0002064, whisper_loss=0.1049, over 21372.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01103, ecapa_loss=0.0001909, whisper_loss=0.09199, over 3785834.33 frames. ], batch size: 83, lr: 7.21e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:41:30,311 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 15:41:52,895 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.648e+01 3.009e+01 3.325e+01 6.049e+01, threshold=6.017e+01, percent-clipped=1.0 2024-08-11 15:41:58,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1168080.0, ans=0.1 2024-08-11 15:42:03,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1168080.0, ans=0.0 2024-08-11 15:42:06,719 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 15:42:07,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1168080.0, ans=0.125 2024-08-11 15:42:16,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1168180.0, ans=0.125 2024-08-11 15:42:18,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1168180.0, ans=0.0 2024-08-11 15:42:24,056 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.56 vs. limit=15.0 2024-08-11 15:42:27,331 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 15:42:29,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1168180.0, ans=0.125 2024-08-11 15:42:38,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1168280.0, ans=0.125 2024-08-11 15:42:50,111 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 900, loss[loss=0.09554, beats_loss=0.01444, ecapa_loss=0.0001363, whisper_loss=0.07974, over 18403.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01114, ecapa_loss=0.0001904, whisper_loss=0.09181, over 3818389.89 frames. ], batch size: 71, lr: 7.21e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:42:51,296 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.65 vs. limit=22.5 2024-08-11 15:43:01,401 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 21 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-11 15:43:15,199 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=12.0 2024-08-11 15:43:21,733 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.58 vs. limit=15.0 2024-08-11 15:43:21,797 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.68 vs. limit=22.5 2024-08-11 15:43:50,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1168680.0, ans=0.125 2024-08-11 15:43:53,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1168680.0, ans=0.0 2024-08-11 15:44:11,525 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.42 vs. limit=15.0 2024-08-11 15:44:15,063 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 950, loss[loss=0.1207, beats_loss=0.01024, ecapa_loss=0.0001739, whisper_loss=0.1087, over 23235.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01107, ecapa_loss=0.0001902, whisper_loss=0.09234, over 3817279.95 frames. ], batch size: 88, lr: 7.21e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:44:26,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1168880.0, ans=0.125 2024-08-11 15:44:27,634 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 15:44:36,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1168980.0, ans=0.125 2024-08-11 15:44:42,603 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.663e+01 2.966e+01 3.403e+01 1.009e+02, threshold=5.932e+01, percent-clipped=1.0 2024-08-11 15:44:57,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1169080.0, ans=0.2 2024-08-11 15:45:21,864 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 17 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-11 15:45:28,570 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 15:45:37,304 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1000, loss[loss=0.1235, beats_loss=0.009449, ecapa_loss=0.0001779, whisper_loss=0.1123, over 23280.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01111, ecapa_loss=0.0001882, whisper_loss=0.09204, over 3815764.36 frames. ], batch size: 88, lr: 7.21e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:45:40,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1169380.0, ans=0.2 2024-08-11 15:45:48,866 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 30 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-11 15:46:10,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1169580.0, ans=0.125 2024-08-11 15:46:24,573 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2024-08-11 15:46:40,909 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 15:46:45,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1169780.0, ans=0.1 2024-08-11 15:46:46,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1169780.0, ans=0.2 2024-08-11 15:46:51,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1169780.0, ans=0.125 2024-08-11 15:46:51,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1169780.0, ans=0.0 2024-08-11 15:47:00,913 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1050, loss[loss=0.08075, beats_loss=0.01348, ecapa_loss=0.0001432, whisper_loss=0.06585, over 17827.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01114, ecapa_loss=0.0001882, whisper_loss=0.09149, over 3816792.18 frames. ], batch size: 68, lr: 7.21e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:47:13,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1169880.0, ans=0.025 2024-08-11 15:47:13,447 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.06 vs. limit=10.0 2024-08-11 15:47:17,023 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-11 15:47:29,256 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.579e+01 2.847e+01 3.241e+01 6.261e+01, threshold=5.695e+01, percent-clipped=1.0 2024-08-11 15:47:31,706 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 15:48:00,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.29 vs. limit=15.0 2024-08-11 15:48:31,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1170380.0, ans=0.04949747468305833 2024-08-11 15:48:32,766 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1100, loss[loss=0.1349, beats_loss=0.008439, ecapa_loss=0.0001935, whisper_loss=0.1245, over 24455.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01109, ecapa_loss=0.0001877, whisper_loss=0.09329, over 3867369.65 frames. ], batch size: 92, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:48:51,513 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2024-08-11 15:48:53,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1170480.0, ans=0.0 2024-08-11 15:48:58,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1170480.0, ans=0.2 2024-08-11 15:49:03,739 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 15:49:05,655 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-11 15:49:05,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1170480.0, ans=0.2 2024-08-11 15:49:33,720 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-08-11 15:49:41,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1170780.0, ans=0.125 2024-08-11 15:49:46,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1170780.0, ans=6.0 2024-08-11 15:49:58,936 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1150, loss[loss=0.0991, beats_loss=0.009787, ecapa_loss=0.0001453, whisper_loss=0.08786, over 21731.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01109, ecapa_loss=0.0001864, whisper_loss=0.09343, over 3836142.27 frames. ], batch size: 81, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:50:01,417 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.40 vs. limit=15.0 2024-08-11 15:50:03,553 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 15:50:05,679 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-11 15:50:06,190 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2024-08-11 15:50:06,323 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.62 vs. limit=22.5 2024-08-11 15:50:12,585 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.07 vs. limit=10.0 2024-08-11 15:50:15,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1170980.0, ans=0.05 2024-08-11 15:50:25,758 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.574e+01 2.982e+01 3.415e+01 5.178e+01, threshold=5.965e+01, percent-clipped=0.0 2024-08-11 15:50:53,776 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 15:50:54,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1171180.0, ans=0.0 2024-08-11 15:50:54,118 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.980e-01 2024-08-11 15:50:55,782 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-11 15:51:10,549 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=15.0 2024-08-11 15:51:20,651 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1200, loss[loss=0.1058, beats_loss=0.01267, ecapa_loss=0.0001577, whisper_loss=0.09154, over 18330.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01123, ecapa_loss=0.0001852, whisper_loss=0.09229, over 3824714.26 frames. ], batch size: 71, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:51:32,998 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2024-08-11 15:51:41,651 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 16 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-11 15:51:54,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1171580.0, ans=0.125 2024-08-11 15:52:11,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1171680.0, ans=0.0 2024-08-11 15:52:19,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1171680.0, ans=0.2 2024-08-11 15:52:21,404 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.480e+05 2024-08-11 15:52:42,265 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1250, loss[loss=0.08577, beats_loss=0.0119, ecapa_loss=0.0002001, whisper_loss=0.07187, over 18867.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01131, ecapa_loss=0.0001847, whisper_loss=0.09152, over 3803276.30 frames. ], batch size: 75, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:52:54,753 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.05 vs. limit=22.5 2024-08-11 15:53:06,806 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.75 vs. limit=10.0 2024-08-11 15:53:07,745 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.593e+01 3.089e+01 3.473e+01 5.447e+01, threshold=6.177e+01, percent-clipped=0.0 2024-08-11 15:53:16,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1172080.0, ans=0.125 2024-08-11 15:53:18,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1172080.0, ans=0.05 2024-08-11 15:53:21,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1172080.0, ans=0.2 2024-08-11 15:53:21,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1172080.0, ans=0.125 2024-08-11 15:53:35,883 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-11 15:53:36,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1172180.0, ans=0.0 2024-08-11 15:53:47,642 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 29 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 15:53:56,580 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-11 15:54:02,370 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1300, loss[loss=0.09474, beats_loss=0.0103, ecapa_loss=0.0002101, whisper_loss=0.08233, over 14981.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01136, ecapa_loss=0.0001847, whisper_loss=0.0911, over 3786597.64 frames. ], batch size: 60, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:54:12,281 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 15:54:13,065 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.00 vs. limit=15.0 2024-08-11 15:54:22,509 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 15:54:35,457 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.50 vs. limit=15.0 2024-08-11 15:54:40,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1172580.0, ans=0.2 2024-08-11 15:54:40,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2024-08-11 15:54:53,030 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 15:54:56,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1172680.0, ans=0.0 2024-08-11 15:55:04,785 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.07 vs. limit=15.0 2024-08-11 15:55:10,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1172780.0, ans=0.2 2024-08-11 15:55:22,766 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1350, loss[loss=0.1013, beats_loss=0.006471, ecapa_loss=0.0001916, whisper_loss=0.09295, over 15173.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0113, ecapa_loss=0.0001863, whisper_loss=0.09131, over 3787851.76 frames. ], batch size: 55, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:55:33,828 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-11 15:55:41,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1172980.0, ans=0.125 2024-08-11 15:55:51,496 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.024e+01 2.580e+01 3.028e+01 3.578e+01 5.392e+01, threshold=6.056e+01, percent-clipped=0.0 2024-08-11 15:56:02,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1173080.0, ans=0.125 2024-08-11 15:56:15,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1173180.0, ans=0.0 2024-08-11 15:56:15,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1173180.0, ans=0.125 2024-08-11 15:56:16,615 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 29 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 15:56:20,086 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2024-08-11 15:56:24,264 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 15:56:28,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1173180.0, ans=0.07 2024-08-11 15:56:45,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1173280.0, ans=0.125 2024-08-11 15:56:50,294 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1400, loss[loss=0.08385, beats_loss=0.01113, ecapa_loss=0.0002275, whisper_loss=0.07045, over 16597.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01118, ecapa_loss=0.0001874, whisper_loss=0.09182, over 3797038.47 frames. ], batch size: 71, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:56:53,636 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 15:56:56,815 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 15:57:07,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1173480.0, ans=0.1 2024-08-11 15:57:13,685 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-08-11 15:57:17,140 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-11 15:58:02,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1173780.0, ans=0.0 2024-08-11 15:58:05,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1173780.0, ans=0.125 2024-08-11 15:58:13,202 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1450, loss[loss=0.1077, beats_loss=0.009726, ecapa_loss=0.0002102, whisper_loss=0.09584, over 19724.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01112, ecapa_loss=0.0001869, whisper_loss=0.0924, over 3807695.77 frames. ], batch size: 79, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:58:13,504 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 15:58:50,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1173880.0, ans=0.0 2024-08-11 15:58:59,113 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-11 15:59:09,544 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.169e+01 2.580e+01 2.876e+01 3.331e+01 4.704e+01, threshold=5.752e+01, percent-clipped=0.0 2024-08-11 15:59:35,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1174180.0, ans=0.125 2024-08-11 15:59:41,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1174180.0, ans=0.07 2024-08-11 16:00:06,240 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1500, loss[loss=0.1038, beats_loss=0.01283, ecapa_loss=0.0001697, whisper_loss=0.08931, over 22348.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01119, ecapa_loss=0.0001867, whisper_loss=0.09153, over 3824408.73 frames. ], batch size: 89, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:00:14,958 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.94 vs. limit=22.5 2024-08-11 16:00:39,420 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-11 16:00:49,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1174580.0, ans=0.125 2024-08-11 16:01:09,665 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 16:01:11,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1174780.0, ans=0.1 2024-08-11 16:01:13,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1174780.0, ans=0.125 2024-08-11 16:01:24,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1174780.0, ans=0.1 2024-08-11 16:01:26,574 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1550, loss[loss=0.1258, beats_loss=0.01003, ecapa_loss=0.0001934, whisper_loss=0.1138, over 23877.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01117, ecapa_loss=0.0001854, whisper_loss=0.09188, over 3839764.51 frames. ], batch size: 91, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:01:28,951 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 16:01:31,868 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-11 16:01:35,603 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 16:01:49,751 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.347e-02 2024-08-11 16:01:52,168 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.587e+01 2.923e+01 3.490e+01 5.175e+01, threshold=5.845e+01, percent-clipped=0.0 2024-08-11 16:02:03,278 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 16:02:11,452 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-11 16:02:18,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1175180.0, ans=0.2 2024-08-11 16:02:20,155 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.851e+05 2024-08-11 16:02:23,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1175180.0, ans=0.2 2024-08-11 16:02:26,460 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2024-08-11 16:02:27,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1175280.0, ans=0.2 2024-08-11 16:02:41,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1175280.0, ans=0.125 2024-08-11 16:02:43,906 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1600, loss[loss=0.1043, beats_loss=0.01189, ecapa_loss=0.0001905, whisper_loss=0.0905, over 22082.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01115, ecapa_loss=0.0001856, whisper_loss=0.09195, over 3847509.10 frames. ], batch size: 86, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:02:51,916 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-11 16:02:59,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1175480.0, ans=0.125 2024-08-11 16:03:29,441 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.44 vs. limit=22.5 2024-08-11 16:03:31,156 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2024-08-11 16:03:38,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1175680.0, ans=0.2 2024-08-11 16:03:51,042 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 16:03:57,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1175780.0, ans=0.0 2024-08-11 16:04:00,803 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1650, loss[loss=0.1161, beats_loss=0.006905, ecapa_loss=0.0001958, whisper_loss=0.1072, over 17492.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.0111, ecapa_loss=0.0001853, whisper_loss=0.09257, over 3854344.34 frames. ], batch size: 62, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:04:04,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1175880.0, ans=0.125 2024-08-11 16:04:25,549 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.491e+01 2.765e+01 3.253e+01 5.216e+01, threshold=5.529e+01, percent-clipped=0.0 2024-08-11 16:04:46,011 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=15.0 2024-08-11 16:04:58,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1176180.0, ans=0.125 2024-08-11 16:05:14,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1176280.0, ans=0.0 2024-08-11 16:05:17,566 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1700, loss[loss=0.07796, beats_loss=0.01131, ecapa_loss=0.0002032, whisper_loss=0.06462, over 14917.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01102, ecapa_loss=0.0001847, whisper_loss=0.09323, over 3889924.12 frames. ], batch size: 63, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:05:20,879 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 16:05:32,422 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 16:05:55,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1176580.0, ans=0.0 2024-08-11 16:06:09,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1176680.0, ans=0.2 2024-08-11 16:06:09,520 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.37 vs. limit=6.0 2024-08-11 16:06:30,643 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1750, loss[loss=0.09448, beats_loss=0.01122, ecapa_loss=0.0002014, whisper_loss=0.08125, over 14395.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01097, ecapa_loss=0.0001841, whisper_loss=0.09367, over 3911122.15 frames. ], batch size: 60, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:06:31,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1176880.0, ans=0.0 2024-08-11 16:06:40,873 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-11 16:06:43,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1176980.0, ans=0.0 2024-08-11 16:06:52,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1176980.0, ans=0.125 2024-08-11 16:06:53,813 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.634e+01 3.052e+01 3.436e+01 4.631e+01, threshold=6.105e+01, percent-clipped=0.0 2024-08-11 16:07:00,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1177080.0, ans=0.125 2024-08-11 16:07:06,175 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.16 vs. limit=22.5 2024-08-11 16:07:11,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1177080.0, ans=0.125 2024-08-11 16:07:13,159 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2024-08-11 16:07:34,205 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 15 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 16:07:34,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1177280.0, ans=0.125 2024-08-11 16:07:42,305 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1800, loss[loss=0.1042, beats_loss=0.01081, ecapa_loss=0.000148, whisper_loss=0.09187, over 17535.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01107, ecapa_loss=0.0001833, whisper_loss=0.09314, over 3906997.58 frames. ], batch size: 65, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:08:02,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1177480.0, ans=0.125 2024-08-11 16:08:02,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1177480.0, ans=22.5 2024-08-11 16:08:06,831 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 27 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-11 16:08:09,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1177580.0, ans=0.2 2024-08-11 16:08:17,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1177580.0, ans=0.125 2024-08-11 16:08:25,161 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.57 vs. limit=15.0 2024-08-11 16:08:40,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1177780.0, ans=0.125 2024-08-11 16:08:54,324 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1850, loss[loss=0.1109, beats_loss=0.01137, ecapa_loss=0.0002092, whisper_loss=0.09739, over 23488.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01107, ecapa_loss=0.0001848, whisper_loss=0.09331, over 3895989.46 frames. ], batch size: 94, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:09:02,312 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 16:09:18,366 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.637e+01 3.046e+01 3.560e+01 5.616e+01, threshold=6.093e+01, percent-clipped=0.0 2024-08-11 16:09:19,952 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-11 16:09:21,347 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 16:09:26,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1178080.0, ans=0.1 2024-08-11 16:09:37,006 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 16:09:41,340 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 14 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 16:10:00,636 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 16:10:07,670 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1900, loss[loss=0.1049, beats_loss=0.009929, ecapa_loss=0.0001988, whisper_loss=0.09293, over 19168.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01105, ecapa_loss=0.0001857, whisper_loss=0.09355, over 3885882.43 frames. ], batch size: 77, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:10:09,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1178380.0, ans=0.07 2024-08-11 16:10:21,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1178380.0, ans=0.5 2024-08-11 16:10:39,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1178580.0, ans=0.125 2024-08-11 16:10:49,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1178580.0, ans=0.0 2024-08-11 16:10:55,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1178680.0, ans=0.125 2024-08-11 16:10:57,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1178680.0, ans=0.125 2024-08-11 16:11:10,484 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 16:11:22,048 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 1950, loss[loss=0.08126, beats_loss=0.01313, ecapa_loss=0.0001759, whisper_loss=0.06637, over 17831.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01113, ecapa_loss=0.0001877, whisper_loss=0.09275, over 3878061.83 frames. ], batch size: 71, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:11:25,889 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=22.5 2024-08-11 16:11:29,339 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 16:11:39,563 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 16:11:40,772 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 16:11:45,259 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.606e+01 2.950e+01 3.514e+01 8.174e+01, threshold=5.900e+01, percent-clipped=2.0 2024-08-11 16:11:48,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1178980.0, ans=0.0 2024-08-11 16:12:06,682 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-11 16:12:36,774 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2000, loss[loss=0.09496, beats_loss=0.0116, ecapa_loss=0.000193, whisper_loss=0.08143, over 16281.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01112, ecapa_loss=0.0001896, whisper_loss=0.0927, over 3854406.25 frames. ], batch size: 65, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:12:37,893 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.23 vs. limit=22.5 2024-08-11 16:12:50,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1179480.0, ans=0.0 2024-08-11 16:13:01,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1179480.0, ans=0.0 2024-08-11 16:13:05,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1179580.0, ans=0.0 2024-08-11 16:13:11,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1179580.0, ans=0.125 2024-08-11 16:13:11,787 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 16:13:19,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1179580.0, ans=0.07 2024-08-11 16:13:26,244 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=12.0 2024-08-11 16:13:32,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1179680.0, ans=0.125 2024-08-11 16:13:39,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1179780.0, ans=0.125 2024-08-11 16:13:46,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1179780.0, ans=0.125 2024-08-11 16:13:53,186 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2050, loss[loss=0.09658, beats_loss=0.01307, ecapa_loss=0.0001677, whisper_loss=0.08184, over 18365.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01117, ecapa_loss=0.0001895, whisper_loss=0.09192, over 3827774.50 frames. ], batch size: 71, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:13:53,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1179880.0, ans=0.2 2024-08-11 16:14:02,749 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2024-08-11 16:14:04,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1179880.0, ans=0.035 2024-08-11 16:14:18,621 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.671e+01 2.965e+01 3.227e+01 2.393e+02, threshold=5.931e+01, percent-clipped=1.0 2024-08-11 16:14:29,972 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 16:14:47,875 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-11 16:15:03,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1180280.0, ans=0.0 2024-08-11 16:15:14,648 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2100, loss[loss=0.1108, beats_loss=0.01199, ecapa_loss=0.0002009, whisper_loss=0.09676, over 18442.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01121, ecapa_loss=0.0001909, whisper_loss=0.09208, over 3837597.14 frames. ], batch size: 76, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:15:16,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1180380.0, ans=0.0 2024-08-11 16:15:26,255 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 16:15:37,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1180480.0, ans=0.1 2024-08-11 16:15:42,542 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 16:15:48,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1180580.0, ans=0.125 2024-08-11 16:15:53,757 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 16:16:10,477 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 41 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 16:16:12,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1180680.0, ans=0.0 2024-08-11 16:16:17,877 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2024-08-11 16:16:19,339 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2024-08-11 16:16:34,008 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2024-08-11 16:16:37,813 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2150, loss[loss=0.1063, beats_loss=0.01275, ecapa_loss=0.0002305, whisper_loss=0.0912, over 21239.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01129, ecapa_loss=0.0001901, whisper_loss=0.09258, over 3874901.95 frames. ], batch size: 92, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:16:43,168 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 16:16:48,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1180880.0, ans=0.125 2024-08-11 16:16:56,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1180980.0, ans=0.0 2024-08-11 16:16:56,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1180980.0, ans=0.0 2024-08-11 16:17:03,539 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.740e+01 2.984e+01 3.481e+01 5.761e+01, threshold=5.968e+01, percent-clipped=0.0 2024-08-11 16:17:08,269 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=6.0 2024-08-11 16:17:10,291 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=15.0 2024-08-11 16:17:17,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1181080.0, ans=0.1 2024-08-11 16:17:20,338 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.51 vs. limit=15.0 2024-08-11 16:17:27,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1181180.0, ans=0.125 2024-08-11 16:17:58,102 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 16:18:01,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1181380.0, ans=0.0 2024-08-11 16:18:02,163 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2200, loss[loss=0.1282, beats_loss=0.007669, ecapa_loss=0.0001915, whisper_loss=0.1187, over 16307.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01131, ecapa_loss=0.0001907, whisper_loss=0.09272, over 3861153.48 frames. ], batch size: 61, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:18:02,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1181380.0, ans=0.125 2024-08-11 16:18:10,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1181380.0, ans=22.5 2024-08-11 16:18:11,198 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.74 vs. limit=12.0 2024-08-11 16:18:17,341 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.90 vs. limit=22.5 2024-08-11 16:18:18,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1181480.0, ans=0.1 2024-08-11 16:18:18,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1181480.0, ans=0.1 2024-08-11 16:18:20,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1181480.0, ans=0.0 2024-08-11 16:18:22,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1181480.0, ans=0.025 2024-08-11 16:18:27,325 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 16:18:39,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1181580.0, ans=0.125 2024-08-11 16:18:49,208 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=15.21 vs. limit=15.0 2024-08-11 16:18:52,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1181680.0, ans=0.2 2024-08-11 16:19:04,859 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 16:19:17,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1181780.0, ans=0.1 2024-08-11 16:19:18,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1181780.0, ans=0.2 2024-08-11 16:19:24,433 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2250, loss[loss=0.1155, beats_loss=0.01181, ecapa_loss=0.000161, whisper_loss=0.1021, over 15921.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01135, ecapa_loss=0.0001913, whisper_loss=0.09312, over 3880400.41 frames. ], batch size: 61, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:19:25,172 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 16:19:34,080 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.17 vs. limit=22.5 2024-08-11 16:19:39,468 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-11 16:19:41,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1181980.0, ans=0.2 2024-08-11 16:19:42,996 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 39 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 16:19:47,956 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 16:19:50,351 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.696e+01 3.022e+01 3.450e+01 8.988e+01, threshold=6.044e+01, percent-clipped=1.0 2024-08-11 16:20:17,284 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 16:20:27,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1182280.0, ans=0.0 2024-08-11 16:20:36,243 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.21 vs. limit=15.0 2024-08-11 16:20:37,139 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 16:20:45,186 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2300, loss[loss=0.1004, beats_loss=0.01217, ecapa_loss=0.0001372, whisper_loss=0.08684, over 20294.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01129, ecapa_loss=0.0001913, whisper_loss=0.09372, over 3904453.84 frames. ], batch size: 78, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:21:02,279 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 16:21:05,519 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.019e+05 2024-08-11 16:21:20,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1182580.0, ans=0.125 2024-08-11 16:21:24,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1182580.0, ans=0.2 2024-08-11 16:21:26,285 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 16:21:26,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1182580.0, ans=0.0 2024-08-11 16:21:26,914 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.06 vs. limit=22.5 2024-08-11 16:21:39,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1182680.0, ans=0.125 2024-08-11 16:21:51,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1182780.0, ans=0.125 2024-08-11 16:22:05,727 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2350, loss[loss=0.1105, beats_loss=0.008977, ecapa_loss=0.000191, whisper_loss=0.09964, over 17020.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01121, ecapa_loss=0.0001925, whisper_loss=0.09377, over 3852216.50 frames. ], batch size: 66, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:22:08,623 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.30 vs. limit=10.0 2024-08-11 16:22:09,430 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 16:22:26,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.66 vs. limit=15.0 2024-08-11 16:22:33,858 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=15.0 2024-08-11 16:22:34,446 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.605e+01 2.959e+01 3.391e+01 6.517e+01, threshold=5.918e+01, percent-clipped=1.0 2024-08-11 16:22:39,376 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 16:22:56,556 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 16:23:00,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1183180.0, ans=0.125 2024-08-11 16:23:03,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1183180.0, ans=0.0 2024-08-11 16:23:03,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1183180.0, ans=0.125 2024-08-11 16:23:30,530 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2400, loss[loss=0.1221, beats_loss=0.01008, ecapa_loss=0.0002223, whisper_loss=0.1098, over 22875.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01111, ecapa_loss=0.0001912, whisper_loss=0.09457, over 3852231.06 frames. ], batch size: 92, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:23:32,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1183380.0, ans=0.125 2024-08-11 16:23:33,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1183380.0, ans=0.1 2024-08-11 16:23:43,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1183380.0, ans=0.0 2024-08-11 16:23:45,262 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 15 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 16:23:45,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1183480.0, ans=0.2 2024-08-11 16:23:48,726 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 16:23:49,457 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.94 vs. limit=12.0 2024-08-11 16:23:49,534 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2024-08-11 16:24:10,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1183580.0, ans=0.125 2024-08-11 16:24:22,562 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 16:24:46,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1183780.0, ans=0.07 2024-08-11 16:24:52,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1183780.0, ans=0.125 2024-08-11 16:24:52,889 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=15.0 2024-08-11 16:24:55,471 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2450, loss[loss=0.1127, beats_loss=0.009378, ecapa_loss=0.0002412, whisper_loss=0.1009, over 12961.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01109, ecapa_loss=0.0001914, whisper_loss=0.09413, over 3839558.85 frames. ], batch size: 55, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:25:05,460 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 16:25:10,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1183980.0, ans=0.125 2024-08-11 16:25:12,314 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.539e+00 2024-08-11 16:25:20,558 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.638e+01 2.982e+01 3.407e+01 5.711e+01, threshold=5.963e+01, percent-clipped=0.0 2024-08-11 16:25:29,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1184080.0, ans=0.2 2024-08-11 16:25:33,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=1184080.0, ans=0.02 2024-08-11 16:25:33,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1184080.0, ans=0.125 2024-08-11 16:25:37,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1184080.0, ans=0.125 2024-08-11 16:25:47,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=15.0 2024-08-11 16:26:11,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1184280.0, ans=0.0 2024-08-11 16:26:18,235 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2500, loss[loss=0.1078, beats_loss=0.012, ecapa_loss=0.0001992, whisper_loss=0.09381, over 22309.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01104, ecapa_loss=0.0001917, whisper_loss=0.09377, over 3837784.00 frames. ], batch size: 91, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:26:20,732 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.62 vs. limit=10.0 2024-08-11 16:26:21,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1184380.0, ans=0.125 2024-08-11 16:26:30,297 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 16:26:51,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1184580.0, ans=0.125 2024-08-11 16:26:53,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1184580.0, ans=0.2 2024-08-11 16:26:55,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1184580.0, ans=0.025 2024-08-11 16:27:13,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1184680.0, ans=0.2 2024-08-11 16:27:15,083 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-08-11 16:27:27,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1184780.0, ans=0.125 2024-08-11 16:27:35,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1184780.0, ans=0.125 2024-08-11 16:27:36,456 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-11 16:27:45,249 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2550, loss[loss=0.1025, beats_loss=0.0127, ecapa_loss=0.0001462, whisper_loss=0.08829, over 14656.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01113, ecapa_loss=0.0001912, whisper_loss=0.09353, over 3859538.63 frames. ], batch size: 55, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:27:55,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=22.5 2024-08-11 16:28:00,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1184980.0, ans=0.1 2024-08-11 16:28:06,996 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 16:28:11,797 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.549e+01 2.871e+01 3.222e+01 4.395e+01, threshold=5.742e+01, percent-clipped=0.0 2024-08-11 16:28:18,637 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.54 vs. limit=22.5 2024-08-11 16:28:29,430 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 16:28:43,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1185180.0, ans=0.1 2024-08-11 16:28:50,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1185180.0, ans=0.015 2024-08-11 16:28:56,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1185280.0, ans=0.0 2024-08-11 16:29:10,415 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2600, loss[loss=0.09489, beats_loss=0.009562, ecapa_loss=0.0001888, whisper_loss=0.08344, over 15840.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01116, ecapa_loss=0.0001919, whisper_loss=0.09344, over 3892626.16 frames. ], batch size: 62, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:29:23,230 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 16:29:41,072 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2024-08-11 16:29:45,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1185580.0, ans=0.125 2024-08-11 16:30:34,773 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2650, loss[loss=0.1092, beats_loss=0.01064, ecapa_loss=0.0002233, whisper_loss=0.09631, over 22301.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01118, ecapa_loss=0.0001918, whisper_loss=0.09276, over 3899613.78 frames. ], batch size: 90, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:30:43,655 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 9 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-11 16:30:44,591 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.33 vs. limit=6.0 2024-08-11 16:30:45,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1185880.0, ans=0.0 2024-08-11 16:30:46,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1185880.0, ans=0.125 2024-08-11 16:30:48,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1185880.0, ans=0.125 2024-08-11 16:30:54,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1185980.0, ans=0.0 2024-08-11 16:30:59,130 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 16:31:01,420 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.668e+01 2.978e+01 3.517e+01 4.989e+01, threshold=5.956e+01, percent-clipped=0.0 2024-08-11 16:31:12,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1186080.0, ans=0.125 2024-08-11 16:31:25,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1186180.0, ans=0.2 2024-08-11 16:31:29,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1186180.0, ans=0.125 2024-08-11 16:31:52,650 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 19 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-11 16:31:53,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=1186280.0, ans=0.2 2024-08-11 16:31:58,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1186380.0, ans=0.125 2024-08-11 16:31:59,030 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2700, loss[loss=0.1063, beats_loss=0.01343, ecapa_loss=0.0001648, whisper_loss=0.09126, over 23712.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0112, ecapa_loss=0.0001916, whisper_loss=0.09182, over 3875749.88 frames. ], batch size: 94, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:32:06,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1186380.0, ans=0.125 2024-08-11 16:32:14,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1186480.0, ans=0.125 2024-08-11 16:32:38,658 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.85 vs. limit=10.0 2024-08-11 16:32:51,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1186680.0, ans=0.1 2024-08-11 16:33:02,146 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 16:33:03,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1186780.0, ans=0.1 2024-08-11 16:33:06,867 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 16:33:15,166 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 16:33:20,275 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2750, loss[loss=0.1142, beats_loss=0.009895, ecapa_loss=0.0001789, whisper_loss=0.1025, over 23162.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01119, ecapa_loss=0.0001919, whisper_loss=0.09199, over 3829450.23 frames. ], batch size: 88, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:33:27,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1186880.0, ans=0.2 2024-08-11 16:33:28,666 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 16:33:32,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=1186880.0, ans=0.02 2024-08-11 16:33:33,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1186880.0, ans=0.1 2024-08-11 16:33:45,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1186980.0, ans=0.125 2024-08-11 16:33:47,018 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.790e+01 3.167e+01 3.660e+01 5.593e+01, threshold=6.335e+01, percent-clipped=0.0 2024-08-11 16:33:55,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1187080.0, ans=0.125 2024-08-11 16:33:55,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1187080.0, ans=0.0 2024-08-11 16:34:02,916 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 16:34:08,630 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2024-08-11 16:34:10,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1187180.0, ans=0.2 2024-08-11 16:34:18,740 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 16:34:32,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1187280.0, ans=0.125 2024-08-11 16:34:42,607 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2800, loss[loss=0.09401, beats_loss=0.0133, ecapa_loss=0.000175, whisper_loss=0.07896, over 13015.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01128, ecapa_loss=0.0001911, whisper_loss=0.09172, over 3829169.83 frames. ], batch size: 53, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:34:54,991 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-11 16:34:59,368 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 16:35:09,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1187480.0, ans=0.0 2024-08-11 16:35:14,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1187580.0, ans=0.125 2024-08-11 16:35:28,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=15.0 2024-08-11 16:35:35,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1187680.0, ans=0.125 2024-08-11 16:35:39,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1187680.0, ans=0.125 2024-08-11 16:35:47,037 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 15 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 16:35:57,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1187780.0, ans=0.125 2024-08-11 16:36:04,565 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2850, loss[loss=0.09921, beats_loss=0.01087, ecapa_loss=0.0002209, whisper_loss=0.08614, over 19627.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01133, ecapa_loss=0.000191, whisper_loss=0.09143, over 3839691.29 frames. ], batch size: 79, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:36:07,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1187880.0, ans=0.1 2024-08-11 16:36:31,548 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.618e+01 2.962e+01 3.443e+01 5.615e+01, threshold=5.924e+01, percent-clipped=0.0 2024-08-11 16:36:49,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1188080.0, ans=0.125 2024-08-11 16:36:55,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1188180.0, ans=0.1 2024-08-11 16:37:01,597 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-11 16:37:10,044 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-11 16:37:11,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1188280.0, ans=0.125 2024-08-11 16:37:28,191 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2900, loss[loss=0.1298, beats_loss=0.009074, ecapa_loss=0.0001773, whisper_loss=0.1189, over 23295.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01138, ecapa_loss=0.0001906, whisper_loss=0.0919, over 3867594.67 frames. ], batch size: 88, lr: 7.15e-03, grad_scale: 1.152921504606847e+18 2024-08-11 16:37:30,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1188380.0, ans=0.2 2024-08-11 16:37:36,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1188380.0, ans=0.2 2024-08-11 16:37:41,253 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 16:37:43,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1188480.0, ans=0.95 2024-08-11 16:37:45,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1188480.0, ans=0.125 2024-08-11 16:37:53,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1188480.0, ans=0.125 2024-08-11 16:38:12,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1188580.0, ans=0.125 2024-08-11 16:38:25,733 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=12.0 2024-08-11 16:38:34,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1188780.0, ans=0.2 2024-08-11 16:38:43,946 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 2950, loss[loss=0.1089, beats_loss=0.01069, ecapa_loss=0.0001956, whisper_loss=0.09622, over 15011.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01131, ecapa_loss=0.0001919, whisper_loss=0.09207, over 3834469.13 frames. ], batch size: 58, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:38:44,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1188880.0, ans=0.2 2024-08-11 16:38:46,716 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 16:38:59,604 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-11 16:39:00,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1188980.0, ans=0.125 2024-08-11 16:39:03,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1188980.0, ans=0.125 2024-08-11 16:39:04,802 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.27 vs. limit=22.5 2024-08-11 16:39:06,569 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.714e+01 3.075e+01 3.561e+01 5.736e+01, threshold=6.149e+01, percent-clipped=0.0 2024-08-11 16:39:16,563 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 16:39:43,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1189280.0, ans=0.0 2024-08-11 16:39:49,573 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.49 vs. limit=15.0 2024-08-11 16:39:51,285 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3000, loss[loss=0.09917, beats_loss=0.01058, ecapa_loss=0.0002144, whisper_loss=0.08644, over 22617.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01128, ecapa_loss=0.0001912, whisper_loss=0.09261, over 3866387.93 frames. ], batch size: 91, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:39:51,286 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 16:40:32,704 INFO [train_multi_KD3.py:1149] (2/4) Epoch 9, validation on ASR_libri: loss=0.2566, beats_loss=0, ecapa_loss=0.0006312, whisper_loss=0.2502, over 922467.00 frames. 2024-08-11 16:40:50,137 INFO [train_multi_KD3.py:1149] (2/4) Epoch 9, validation on SV_voxceleb1: loss=0.005299, beats_loss=0, ecapa_loss=0.0005299, whisper_loss=0, over 939242.00 frames. 2024-08-11 16:42:48,151 INFO [train_multi_KD3.py:1149] (2/4) Epoch 9, validation on AT_audioset: loss=0.02498, beats_loss=0.02498, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 16:42:48,155 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 16:42:48,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1189380.0, ans=0.125 2024-08-11 16:43:02,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1189480.0, ans=0.0 2024-08-11 16:43:09,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1189480.0, ans=0.0 2024-08-11 16:43:11,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1189480.0, ans=0.2 2024-08-11 16:43:30,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1189680.0, ans=0.125 2024-08-11 16:43:46,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1189780.0, ans=0.0 2024-08-11 16:43:50,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1189780.0, ans=0.125 2024-08-11 16:43:54,654 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3050, loss[loss=0.08923, beats_loss=0.01309, ecapa_loss=0.0001902, whisper_loss=0.07423, over 21026.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01129, ecapa_loss=0.0001929, whisper_loss=0.09242, over 3868680.87 frames. ], batch size: 84, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:43:55,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1189880.0, ans=0.125 2024-08-11 16:43:55,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1189880.0, ans=0.2 2024-08-11 16:43:55,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.68 vs. limit=12.0 2024-08-11 16:44:01,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1189880.0, ans=0.125 2024-08-11 16:44:14,593 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 16:44:16,836 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.646e+01 3.011e+01 3.406e+01 6.810e+01, threshold=6.022e+01, percent-clipped=0.0 2024-08-11 16:44:21,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1190080.0, ans=0.1 2024-08-11 16:44:34,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1190180.0, ans=0.125 2024-08-11 16:44:35,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1190180.0, ans=0.0 2024-08-11 16:44:44,216 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 16:44:51,290 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 16:45:01,337 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3100, loss[loss=0.09945, beats_loss=0.01074, ecapa_loss=0.0002226, whisper_loss=0.08648, over 21867.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01121, ecapa_loss=0.0001955, whisper_loss=0.09288, over 3847451.67 frames. ], batch size: 90, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:45:12,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1190380.0, ans=0.125 2024-08-11 16:45:15,420 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 16:45:16,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1190480.0, ans=0.2 2024-08-11 16:45:18,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1190480.0, ans=0.0 2024-08-11 16:45:23,202 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 16:45:26,871 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2024-08-11 16:45:27,386 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 16:45:30,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1190580.0, ans=0.2 2024-08-11 16:45:31,498 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 12 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 16:45:32,105 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.76 vs. limit=22.5 2024-08-11 16:45:40,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1190680.0, ans=0.125 2024-08-11 16:45:52,873 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 28 from LS+wenet, 35 from Vox, 32 fro AS 2024-08-11 16:46:04,863 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 16:46:09,170 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3150, loss[loss=0.1091, beats_loss=0.009228, ecapa_loss=0.0003249, whisper_loss=0.09661, over 17839.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01122, ecapa_loss=0.0001953, whisper_loss=0.09315, over 3867964.53 frames. ], batch size: 79, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:46:12,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1190880.0, ans=0.125 2024-08-11 16:46:31,642 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.615e+01 2.879e+01 3.586e+01 1.580e+02, threshold=5.758e+01, percent-clipped=2.0 2024-08-11 16:46:52,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1191180.0, ans=0.2 2024-08-11 16:46:57,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1191180.0, ans=0.035 2024-08-11 16:47:02,651 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 16:47:03,273 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.97 vs. limit=22.5 2024-08-11 16:47:11,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1191280.0, ans=0.125 2024-08-11 16:47:15,767 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3200, loss[loss=0.09893, beats_loss=0.01031, ecapa_loss=0.0002388, whisper_loss=0.08623, over 17452.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01117, ecapa_loss=0.000195, whisper_loss=0.09408, over 3888720.46 frames. ], batch size: 71, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:47:17,212 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 16:47:17,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1191380.0, ans=0.09899494936611666 2024-08-11 16:47:39,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1191480.0, ans=0.1 2024-08-11 16:47:44,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1191580.0, ans=0.0 2024-08-11 16:47:47,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1191580.0, ans=0.04949747468305833 2024-08-11 16:48:00,386 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 16:48:02,330 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.07 vs. limit=10.0 2024-08-11 16:48:17,997 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 16:48:22,481 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3250, loss[loss=0.1089, beats_loss=0.01179, ecapa_loss=0.0001769, whisper_loss=0.09529, over 15084.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01109, ecapa_loss=0.0001965, whisper_loss=0.09506, over 3882535.01 frames. ], batch size: 59, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:48:40,402 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 14 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 16:48:44,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1191980.0, ans=0.125 2024-08-11 16:48:45,272 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.517e+01 2.867e+01 3.292e+01 6.213e+01, threshold=5.733e+01, percent-clipped=1.0 2024-08-11 16:48:56,442 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.02 vs. limit=22.5 2024-08-11 16:49:10,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1192180.0, ans=10.0 2024-08-11 16:49:23,978 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2024-08-11 16:49:29,650 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3300, loss[loss=0.1033, beats_loss=0.01178, ecapa_loss=0.0001937, whisper_loss=0.08956, over 16237.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01113, ecapa_loss=0.0001965, whisper_loss=0.09443, over 3852441.98 frames. ], batch size: 65, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:49:37,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1192380.0, ans=0.07 2024-08-11 16:49:42,861 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.33 vs. limit=15.0 2024-08-11 16:50:10,624 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.049e-02 2024-08-11 16:50:13,251 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 32 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 16:50:13,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1192680.0, ans=0.0 2024-08-11 16:50:25,272 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 16:50:33,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1192780.0, ans=0.0 2024-08-11 16:50:37,021 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3350, loss[loss=0.0998, beats_loss=0.01021, ecapa_loss=0.0001637, whisper_loss=0.08796, over 15840.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01109, ecapa_loss=0.0001955, whisper_loss=0.09402, over 3843033.69 frames. ], batch size: 61, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:50:37,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1192880.0, ans=0.125 2024-08-11 16:50:47,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1192880.0, ans=0.125 2024-08-11 16:50:49,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1192980.0, ans=0.0 2024-08-11 16:50:53,129 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2024-08-11 16:50:59,281 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.599e+01 2.933e+01 3.463e+01 7.726e+01, threshold=5.866e+01, percent-clipped=2.0 2024-08-11 16:51:03,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1193080.0, ans=0.125 2024-08-11 16:51:07,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1193080.0, ans=0.125 2024-08-11 16:51:11,093 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 19 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 16:51:29,672 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 13 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 16:51:32,226 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-11 16:51:42,624 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3400, loss[loss=0.103, beats_loss=0.01151, ecapa_loss=0.0001948, whisper_loss=0.08951, over 22497.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01124, ecapa_loss=0.0001941, whisper_loss=0.09362, over 3870411.03 frames. ], batch size: 91, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:51:44,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1193380.0, ans=0.2 2024-08-11 16:51:46,630 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 16:52:00,113 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 16:52:11,797 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 16:52:13,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1193580.0, ans=0.125 2024-08-11 16:52:16,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1193580.0, ans=0.125 2024-08-11 16:52:22,998 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 16:52:36,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1193780.0, ans=0.2 2024-08-11 16:52:39,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1193780.0, ans=0.125 2024-08-11 16:52:41,740 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 16:52:44,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1193780.0, ans=0.0 2024-08-11 16:52:47,994 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 16:52:48,948 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3450, loss[loss=0.11, beats_loss=0.009542, ecapa_loss=0.0002194, whisper_loss=0.09829, over 21728.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01117, ecapa_loss=0.0001951, whisper_loss=0.09335, over 3876689.22 frames. ], batch size: 90, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:52:52,819 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 16:52:54,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1193880.0, ans=0.125 2024-08-11 16:53:08,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1193980.0, ans=0.0 2024-08-11 16:53:11,042 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.578e+01 2.987e+01 3.563e+01 4.797e+01, threshold=5.975e+01, percent-clipped=0.0 2024-08-11 16:53:22,264 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-11 16:53:23,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1194080.0, ans=0.125 2024-08-11 16:53:31,512 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 16:53:32,705 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-11 16:53:35,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1194180.0, ans=0.125 2024-08-11 16:53:52,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1194280.0, ans=0.125 2024-08-11 16:53:52,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1194280.0, ans=0.0 2024-08-11 16:53:54,395 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3500, loss[loss=0.1124, beats_loss=0.00838, ecapa_loss=0.000189, whisper_loss=0.1021, over 16876.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01119, ecapa_loss=0.000193, whisper_loss=0.09306, over 3864599.12 frames. ], batch size: 63, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:53:59,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1194380.0, ans=10.0 2024-08-11 16:54:04,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1194380.0, ans=0.0 2024-08-11 16:54:14,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1194480.0, ans=0.125 2024-08-11 16:54:24,131 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.07 vs. limit=15.0 2024-08-11 16:54:24,340 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.28 vs. limit=15.0 2024-08-11 16:54:26,216 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 30 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 16:54:29,335 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.73 vs. limit=10.0 2024-08-11 16:54:36,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1194680.0, ans=0.1 2024-08-11 16:55:00,074 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3550, loss[loss=0.1071, beats_loss=0.01183, ecapa_loss=0.0002366, whisper_loss=0.09294, over 20938.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01144, ecapa_loss=0.0001934, whisper_loss=0.09156, over 3893263.45 frames. ], batch size: 91, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:55:00,221 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 16:55:09,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1194880.0, ans=10.0 2024-08-11 16:55:10,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1194880.0, ans=0.0 2024-08-11 16:55:20,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1194980.0, ans=0.0 2024-08-11 16:55:20,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1194980.0, ans=0.0 2024-08-11 16:55:22,879 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.768e+01 2.986e+01 3.532e+01 5.359e+01, threshold=5.971e+01, percent-clipped=0.0 2024-08-11 16:56:03,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1195280.0, ans=0.125 2024-08-11 16:56:07,236 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3600, loss[loss=0.1134, beats_loss=0.01156, ecapa_loss=0.0002038, whisper_loss=0.09983, over 20239.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01151, ecapa_loss=0.0001922, whisper_loss=0.09123, over 3875919.18 frames. ], batch size: 79, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:56:17,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1195380.0, ans=0.07 2024-08-11 16:56:21,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1195480.0, ans=0.07 2024-08-11 16:56:29,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1195480.0, ans=0.125 2024-08-11 16:56:30,900 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.26 vs. limit=6.0 2024-08-11 16:56:31,592 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 16:56:40,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1195580.0, ans=0.125 2024-08-11 16:56:53,914 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.88 vs. limit=15.0 2024-08-11 16:56:54,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1195680.0, ans=0.2 2024-08-11 16:56:57,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1195680.0, ans=0.2 2024-08-11 16:57:08,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1195780.0, ans=0.1 2024-08-11 16:57:10,313 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 16:57:13,822 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3650, loss[loss=0.09129, beats_loss=0.01365, ecapa_loss=0.0001475, whisper_loss=0.07616, over 15465.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01144, ecapa_loss=0.0001919, whisper_loss=0.09144, over 3857133.56 frames. ], batch size: 61, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:57:36,255 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.649e+01 3.037e+01 3.697e+01 5.413e+01, threshold=6.074e+01, percent-clipped=0.0 2024-08-11 16:57:45,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1196080.0, ans=0.2 2024-08-11 16:57:49,961 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-11 16:57:55,109 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 7 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-11 16:58:06,124 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-08-11 16:58:17,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1196280.0, ans=0.0 2024-08-11 16:58:21,142 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3700, loss[loss=0.1205, beats_loss=0.008917, ecapa_loss=0.0002058, whisper_loss=0.1095, over 22366.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01141, ecapa_loss=0.0001916, whisper_loss=0.09215, over 3828036.72 frames. ], batch size: 87, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:58:24,342 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 16:58:25,164 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-08-11 16:58:43,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1196480.0, ans=0.2 2024-08-11 16:59:11,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1196680.0, ans=0.025 2024-08-11 16:59:12,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1196680.0, ans=0.125 2024-08-11 16:59:22,522 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-11 16:59:26,496 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 16:59:27,700 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3750, loss[loss=0.1274, beats_loss=0.008966, ecapa_loss=0.0002236, whisper_loss=0.1162, over 22139.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01138, ecapa_loss=0.0001918, whisper_loss=0.09293, over 3871198.96 frames. ], batch size: 88, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:59:39,063 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-11 16:59:41,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1196980.0, ans=0.0 2024-08-11 16:59:50,667 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.626e+01 2.806e+01 3.237e+01 4.971e+01, threshold=5.612e+01, percent-clipped=0.0 2024-08-11 17:00:02,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1197080.0, ans=0.125 2024-08-11 17:00:05,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1197080.0, ans=0.05 2024-08-11 17:00:23,777 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 17:00:27,896 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 17:00:31,697 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 17:00:34,262 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3800, loss[loss=0.1072, beats_loss=0.01142, ecapa_loss=0.000213, whisper_loss=0.09362, over 19482.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01134, ecapa_loss=0.0001918, whisper_loss=0.09286, over 3846486.15 frames. ], batch size: 82, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:00:42,323 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.945e-01 2024-08-11 17:00:43,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1197380.0, ans=0.1 2024-08-11 17:00:46,249 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 17:00:49,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1197480.0, ans=0.1 2024-08-11 17:00:50,557 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 17:01:02,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1197580.0, ans=0.125 2024-08-11 17:01:11,838 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 17:01:18,258 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 24 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 17:01:22,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1197680.0, ans=0.04949747468305833 2024-08-11 17:01:36,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1197780.0, ans=0.125 2024-08-11 17:01:39,721 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 26 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 17:01:40,851 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3850, loss[loss=0.1205, beats_loss=0.009681, ecapa_loss=0.0001888, whisper_loss=0.1089, over 17163.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01127, ecapa_loss=0.0001937, whisper_loss=0.09319, over 3844458.39 frames. ], batch size: 67, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:01:54,246 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 17:01:58,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1197980.0, ans=0.125 2024-08-11 17:02:03,657 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 2.720e+01 3.010e+01 3.419e+01 7.200e+01, threshold=6.020e+01, percent-clipped=2.0 2024-08-11 17:02:08,246 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 25 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-11 17:02:08,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1198080.0, ans=0.0 2024-08-11 17:02:20,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1198180.0, ans=0.125 2024-08-11 17:02:21,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1198180.0, ans=0.125 2024-08-11 17:02:31,910 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 30 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 17:02:34,997 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.00 vs. limit=22.5 2024-08-11 17:02:36,917 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 17:02:40,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1198280.0, ans=0.1 2024-08-11 17:02:47,635 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3900, loss[loss=0.09329, beats_loss=0.01308, ecapa_loss=0.0001943, whisper_loss=0.07827, over 18323.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01123, ecapa_loss=0.0001942, whisper_loss=0.09383, over 3875672.22 frames. ], batch size: 76, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:02:51,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1198380.0, ans=0.125 2024-08-11 17:03:05,515 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2024-08-11 17:03:22,285 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 17:03:42,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1198780.0, ans=0.125 2024-08-11 17:03:43,863 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2024-08-11 17:03:51,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1198780.0, ans=0.0 2024-08-11 17:03:53,671 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 3950, loss[loss=0.1049, beats_loss=0.01153, ecapa_loss=0.0002197, whisper_loss=0.09114, over 20776.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01118, ecapa_loss=0.0001936, whisper_loss=0.09449, over 3908550.89 frames. ], batch size: 88, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:03:57,762 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 17:04:15,584 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+01 2.737e+01 3.009e+01 3.546e+01 1.155e+02, threshold=6.019e+01, percent-clipped=1.0 2024-08-11 17:04:22,560 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 17:04:40,783 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=22.5 2024-08-11 17:04:48,525 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 17:04:51,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1199280.0, ans=0.0 2024-08-11 17:05:00,927 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4000, loss[loss=0.1078, beats_loss=0.009634, ecapa_loss=0.0001713, whisper_loss=0.09643, over 15305.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01112, ecapa_loss=0.0001943, whisper_loss=0.09472, over 3929375.89 frames. ], batch size: 56, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:05:04,122 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 17:05:10,176 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 14 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 17:05:16,958 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 17:05:24,365 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 17:05:31,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1199580.0, ans=0.125 2024-08-11 17:05:37,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1199580.0, ans=0.125 2024-08-11 17:05:45,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1199680.0, ans=0.0 2024-08-11 17:05:50,998 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2024-08-11 17:06:04,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1199780.0, ans=0.125 2024-08-11 17:06:05,470 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.93 vs. limit=10.0 2024-08-11 17:06:08,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1199780.0, ans=0.125 2024-08-11 17:06:11,441 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4050, loss[loss=0.128, beats_loss=0.009943, ecapa_loss=0.0001981, whisper_loss=0.1161, over 20130.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.0111, ecapa_loss=0.0001936, whisper_loss=0.09483, over 3936930.45 frames. ], batch size: 77, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:06:13,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1199880.0, ans=0.125 2024-08-11 17:06:29,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1199980.0, ans=0.125 2024-08-11 17:06:36,233 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 17:06:37,268 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.884e+01 3.098e+01 3.625e+01 5.878e+01, threshold=6.196e+01, percent-clipped=0.0 2024-08-11 17:06:41,051 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=15.0 2024-08-11 17:06:52,076 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.28 vs. limit=22.5 2024-08-11 17:06:53,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1200080.0, ans=0.1 2024-08-11 17:07:02,437 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 17:07:09,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1200280.0, ans=0.09899494936611666 2024-08-11 17:07:10,545 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 17:07:13,438 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 17:07:14,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1200280.0, ans=0.04949747468305833 2024-08-11 17:07:16,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1200280.0, ans=0.125 2024-08-11 17:07:23,093 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4100, loss[loss=0.1231, beats_loss=0.009019, ecapa_loss=0.0002054, whisper_loss=0.112, over 21389.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01109, ecapa_loss=0.0001948, whisper_loss=0.09459, over 3922925.68 frames. ], batch size: 85, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:07:28,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1200380.0, ans=0.1 2024-08-11 17:07:39,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1200480.0, ans=0.125 2024-08-11 17:07:43,573 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2024-08-11 17:07:46,922 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 24 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-11 17:07:51,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1200580.0, ans=0.125 2024-08-11 17:08:04,870 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2024-08-11 17:08:06,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1200680.0, ans=0.2 2024-08-11 17:08:26,667 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.18 vs. limit=22.5 2024-08-11 17:08:27,275 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 17:08:30,618 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.48 vs. limit=22.5 2024-08-11 17:08:32,514 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4150, loss[loss=0.1272, beats_loss=0.01056, ecapa_loss=0.0001594, whisper_loss=0.115, over 19816.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01115, ecapa_loss=0.0001949, whisper_loss=0.09431, over 3906548.94 frames. ], batch size: 75, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:08:47,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1200980.0, ans=0.125 2024-08-11 17:08:55,216 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.681e+01 3.148e+01 3.707e+01 5.413e+01, threshold=6.297e+01, percent-clipped=0.0 2024-08-11 17:09:09,828 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-08-11 17:09:17,676 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 17:09:23,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1201180.0, ans=0.0 2024-08-11 17:09:26,479 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 17:09:30,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1201280.0, ans=0.1 2024-08-11 17:09:33,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1201280.0, ans=0.0 2024-08-11 17:09:35,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1201280.0, ans=0.1 2024-08-11 17:09:42,980 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4200, loss[loss=0.1183, beats_loss=0.009664, ecapa_loss=0.0001869, whisper_loss=0.1067, over 17660.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01114, ecapa_loss=0.0001952, whisper_loss=0.09427, over 3913960.31 frames. ], batch size: 66, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:10:07,288 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 17:10:30,220 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.11 vs. limit=8.0 2024-08-11 17:10:30,586 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-11 17:10:35,355 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.70 vs. limit=15.0 2024-08-11 17:10:36,117 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 17:10:52,702 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4250, loss[loss=0.1122, beats_loss=0.01251, ecapa_loss=0.0001853, whisper_loss=0.0978, over 17319.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01121, ecapa_loss=0.0001941, whisper_loss=0.09354, over 3918869.71 frames. ], batch size: 67, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:11:11,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1201980.0, ans=0.125 2024-08-11 17:11:14,573 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-11 17:11:16,324 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.599e+01 2.986e+01 3.415e+01 8.403e+01, threshold=5.972e+01, percent-clipped=2.0 2024-08-11 17:11:45,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1202180.0, ans=0.125 2024-08-11 17:11:53,486 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-11 17:11:56,371 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 16 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-11 17:11:58,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1202280.0, ans=0.125 2024-08-11 17:12:01,426 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4300, loss[loss=0.1181, beats_loss=0.01248, ecapa_loss=0.0002102, whisper_loss=0.1035, over 21971.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01119, ecapa_loss=0.0001946, whisper_loss=0.09353, over 3895105.37 frames. ], batch size: 91, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:12:11,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1202380.0, ans=0.0 2024-08-11 17:12:23,592 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 18 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 17:12:35,001 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-08-11 17:12:37,941 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2024-08-11 17:12:40,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1202580.0, ans=0.125 2024-08-11 17:12:42,465 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-11 17:12:43,847 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 17:12:51,934 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 17:12:56,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1202780.0, ans=0.125 2024-08-11 17:13:11,057 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4350, loss[loss=0.117, beats_loss=0.009853, ecapa_loss=0.0002158, whisper_loss=0.105, over 19445.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01122, ecapa_loss=0.0001936, whisper_loss=0.09273, over 3890237.00 frames. ], batch size: 75, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:13:33,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1202980.0, ans=0.1 2024-08-11 17:13:35,505 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.556e+01 3.068e+01 3.501e+01 5.955e+01, threshold=6.137e+01, percent-clipped=0.0 2024-08-11 17:13:38,823 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 17:13:41,349 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 25 from LS+wenet, 6 from Vox, 26 fro AS 2024-08-11 17:13:45,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1203080.0, ans=0.1 2024-08-11 17:13:46,885 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-11 17:13:57,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1203180.0, ans=0.125 2024-08-11 17:14:03,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1203180.0, ans=0.0 2024-08-11 17:14:09,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1203280.0, ans=0.125 2024-08-11 17:14:10,151 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-11 17:14:21,405 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4400, loss[loss=0.09944, beats_loss=0.01136, ecapa_loss=0.0001832, whisper_loss=0.08625, over 20025.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0112, ecapa_loss=0.0001943, whisper_loss=0.0931, over 3896215.51 frames. ], batch size: 81, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:14:26,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1203380.0, ans=0.125 2024-08-11 17:14:27,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1203380.0, ans=0.1 2024-08-11 17:14:38,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1203480.0, ans=0.07 2024-08-11 17:15:04,013 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 17:15:28,395 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.29 vs. limit=15.0 2024-08-11 17:15:31,519 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.714e-02 2024-08-11 17:15:34,277 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4450, loss[loss=0.1074, beats_loss=0.01171, ecapa_loss=0.0001755, whisper_loss=0.09396, over 21544.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.0112, ecapa_loss=0.0001941, whisper_loss=0.09316, over 3915666.59 frames. ], batch size: 88, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:15:35,008 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-08-11 17:15:43,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1203880.0, ans=0.1 2024-08-11 17:15:46,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1203880.0, ans=0.07 2024-08-11 17:16:01,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1203980.0, ans=0.125 2024-08-11 17:16:02,173 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.738e+01 3.141e+01 3.648e+01 6.257e+01, threshold=6.281e+01, percent-clipped=1.0 2024-08-11 17:16:18,271 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.432e-01 2024-08-11 17:16:27,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1204180.0, ans=0.0 2024-08-11 17:16:39,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1204280.0, ans=0.125 2024-08-11 17:16:46,484 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 17:16:54,229 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4500, loss[loss=0.123, beats_loss=0.01012, ecapa_loss=0.0002035, whisper_loss=0.1108, over 22765.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01117, ecapa_loss=0.0001931, whisper_loss=0.09389, over 3947051.04 frames. ], batch size: 89, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:17:07,703 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.383e+01 2024-08-11 17:17:20,831 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-11 17:17:22,912 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2024-08-11 17:17:26,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1204580.0, ans=0.125 2024-08-11 17:17:48,254 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 17:18:03,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1204780.0, ans=0.0 2024-08-11 17:18:04,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1204780.0, ans=0.125 2024-08-11 17:18:11,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1204780.0, ans=0.0 2024-08-11 17:18:17,074 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4550, loss[loss=0.09727, beats_loss=0.0103, ecapa_loss=0.000206, whisper_loss=0.08491, over 16623.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01124, ecapa_loss=0.0001919, whisper_loss=0.09333, over 3941017.46 frames. ], batch size: 63, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:18:19,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1204880.0, ans=0.0 2024-08-11 17:18:42,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1204980.0, ans=0.125 2024-08-11 17:18:44,910 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.743e+01 3.155e+01 3.839e+01 5.758e+01, threshold=6.310e+01, percent-clipped=0.0 2024-08-11 17:18:46,605 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-11 17:18:46,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1204980.0, ans=0.0 2024-08-11 17:18:57,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1205080.0, ans=0.125 2024-08-11 17:18:58,453 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 17:19:13,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1205180.0, ans=0.0 2024-08-11 17:19:16,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1205180.0, ans=0.2 2024-08-11 17:19:18,160 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2024-08-11 17:19:25,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1205280.0, ans=0.125 2024-08-11 17:19:25,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1205280.0, ans=0.2 2024-08-11 17:19:25,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1205280.0, ans=0.05 2024-08-11 17:19:34,381 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4600, loss[loss=0.1188, beats_loss=0.008327, ecapa_loss=0.0001928, whisper_loss=0.1086, over 19223.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01128, ecapa_loss=0.0001929, whisper_loss=0.09306, over 3920937.46 frames. ], batch size: 73, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:19:36,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1205380.0, ans=0.1 2024-08-11 17:19:48,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=12.0 2024-08-11 17:20:19,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1205680.0, ans=0.125 2024-08-11 17:20:45,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1205780.0, ans=0.125 2024-08-11 17:20:54,449 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4650, loss[loss=0.1071, beats_loss=0.01285, ecapa_loss=0.0001872, whisper_loss=0.09238, over 22277.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.0113, ecapa_loss=0.0001933, whisper_loss=0.09261, over 3925867.76 frames. ], batch size: 89, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:20:55,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1205880.0, ans=0.125 2024-08-11 17:21:15,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1205980.0, ans=0.125 2024-08-11 17:21:19,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1205980.0, ans=10.0 2024-08-11 17:21:23,516 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.651e+01 2.897e+01 3.330e+01 4.454e+01, threshold=5.794e+01, percent-clipped=0.0 2024-08-11 17:21:46,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1206180.0, ans=0.025 2024-08-11 17:21:46,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1206180.0, ans=0.09899494936611666 2024-08-11 17:22:01,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1206280.0, ans=0.0 2024-08-11 17:22:06,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1206280.0, ans=0.125 2024-08-11 17:22:12,753 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4700, loss[loss=0.11, beats_loss=0.009353, ecapa_loss=0.0002173, whisper_loss=0.09848, over 17583.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0113, ecapa_loss=0.0001931, whisper_loss=0.0928, over 3940129.72 frames. ], batch size: 72, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:22:14,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1206380.0, ans=0.125 2024-08-11 17:22:17,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1206380.0, ans=15.0 2024-08-11 17:22:31,676 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 33 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 17:22:33,330 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 17:22:38,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1206580.0, ans=0.2 2024-08-11 17:22:46,820 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.911e+05 2024-08-11 17:22:52,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1206680.0, ans=0.0 2024-08-11 17:22:54,956 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.95 vs. limit=22.5 2024-08-11 17:22:58,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1206680.0, ans=0.125 2024-08-11 17:23:02,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1206680.0, ans=0.0 2024-08-11 17:23:15,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1206780.0, ans=0.125 2024-08-11 17:23:19,370 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4750, loss[loss=0.08956, beats_loss=0.01358, ecapa_loss=0.0001614, whisper_loss=0.07437, over 20111.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01128, ecapa_loss=0.0001941, whisper_loss=0.09278, over 3945367.27 frames. ], batch size: 81, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:23:36,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1206980.0, ans=0.1 2024-08-11 17:23:42,243 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.773e+01 3.300e+01 3.701e+01 2.356e+02, threshold=6.600e+01, percent-clipped=1.0 2024-08-11 17:23:48,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1207080.0, ans=0.0 2024-08-11 17:23:52,512 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.60 vs. limit=22.5 2024-08-11 17:23:56,140 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 17:24:05,642 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.64 vs. limit=10.0 2024-08-11 17:24:22,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1207280.0, ans=0.125 2024-08-11 17:24:26,106 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4800, loss[loss=0.1292, beats_loss=0.008707, ecapa_loss=0.0001937, whisper_loss=0.1186, over 23534.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01118, ecapa_loss=0.0001951, whisper_loss=0.09361, over 3913043.94 frames. ], batch size: 91, lr: 7.09e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:24:28,807 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 17:24:31,432 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-11 17:24:32,409 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.18 vs. limit=15.0 2024-08-11 17:24:41,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1207480.0, ans=0.125 2024-08-11 17:24:45,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1207480.0, ans=0.125 2024-08-11 17:24:52,818 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 17:24:59,551 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 17:25:02,465 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 17:25:08,931 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 17:25:09,494 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=12.0 2024-08-11 17:25:10,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1207680.0, ans=0.07 2024-08-11 17:25:14,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1207680.0, ans=0.125 2024-08-11 17:25:19,599 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-11 17:25:20,870 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 17:25:23,723 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 17:25:26,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1207780.0, ans=0.2 2024-08-11 17:25:32,943 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4850, loss[loss=0.1378, beats_loss=0.009333, ecapa_loss=0.0002293, whisper_loss=0.1262, over 22821.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01127, ecapa_loss=0.0001941, whisper_loss=0.09262, over 3897607.32 frames. ], batch size: 88, lr: 7.09e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:25:48,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1207980.0, ans=0.0 2024-08-11 17:25:55,450 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.586e+01 2.829e+01 3.279e+01 4.850e+01, threshold=5.658e+01, percent-clipped=0.0 2024-08-11 17:26:12,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1208180.0, ans=0.1 2024-08-11 17:26:22,468 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 17:26:23,816 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 17:26:26,421 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 17:26:37,408 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=12.0 2024-08-11 17:26:39,310 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4900, loss[loss=0.1035, beats_loss=0.008254, ecapa_loss=0.0002127, whisper_loss=0.09307, over 16531.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01139, ecapa_loss=0.0001918, whisper_loss=0.09236, over 3895843.61 frames. ], batch size: 61, lr: 7.09e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:26:41,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1208380.0, ans=0.125 2024-08-11 17:26:41,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1208380.0, ans=0.04949747468305833 2024-08-11 17:26:47,886 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-11 17:26:54,841 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 17:26:59,089 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-11 17:27:06,828 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=12.0 2024-08-11 17:27:31,156 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 17:27:31,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1208680.0, ans=0.125 2024-08-11 17:27:36,185 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-08-11 17:27:50,160 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 4950, loss[loss=0.1101, beats_loss=0.01064, ecapa_loss=0.0002041, whisper_loss=0.09739, over 21718.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01138, ecapa_loss=0.000193, whisper_loss=0.09218, over 3896303.56 frames. ], batch size: 84, lr: 7.09e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:27:54,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1208880.0, ans=0.125 2024-08-11 17:27:59,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1208880.0, ans=0.2 2024-08-11 17:28:08,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1208980.0, ans=0.1 2024-08-11 17:28:08,966 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-11 17:28:14,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1208980.0, ans=0.125 2024-08-11 17:28:15,172 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.567e+01 2.832e+01 3.214e+01 4.886e+01, threshold=5.664e+01, percent-clipped=0.0 2024-08-11 17:28:16,049 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.759e-01 2024-08-11 17:28:20,632 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.977e+00 2024-08-11 17:28:47,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=15.0 2024-08-11 17:28:53,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1209280.0, ans=0.125 2024-08-11 17:29:04,880 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5000, loss[loss=0.07873, beats_loss=0.01146, ecapa_loss=0.000187, whisper_loss=0.06539, over 17209.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01137, ecapa_loss=0.0001931, whisper_loss=0.09203, over 3887524.05 frames. ], batch size: 69, lr: 7.09e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:29:27,414 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 17:29:51,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1209680.0, ans=0.125 2024-08-11 17:30:14,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1209780.0, ans=0.2 2024-08-11 17:30:14,461 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.492e-01 2024-08-11 17:30:18,079 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 17:30:19,242 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5050, loss[loss=0.0886, beats_loss=0.01335, ecapa_loss=0.0001541, whisper_loss=0.07371, over 14765.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01138, ecapa_loss=0.0001927, whisper_loss=0.09234, over 3882499.84 frames. ], batch size: 56, lr: 7.09e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:30:20,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1209880.0, ans=0.125 2024-08-11 17:30:22,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1209880.0, ans=0.5 2024-08-11 17:30:44,623 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.569e+01 2.847e+01 3.482e+01 7.100e+01, threshold=5.695e+01, percent-clipped=3.0 2024-08-11 17:30:45,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1209980.0, ans=0.125 2024-08-11 17:30:50,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1210080.0, ans=0.125 2024-08-11 17:30:53,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1210080.0, ans=0.125 2024-08-11 17:31:19,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1210280.0, ans=0.0 2024-08-11 17:31:27,581 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-11 17:31:30,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1210280.0, ans=0.1 2024-08-11 17:31:35,322 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5100, loss[loss=0.1095, beats_loss=0.01156, ecapa_loss=0.0002, whisper_loss=0.09594, over 19182.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01137, ecapa_loss=0.0001922, whisper_loss=0.09263, over 3847407.16 frames. ], batch size: 77, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:31:46,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1210380.0, ans=0.1 2024-08-11 17:32:04,941 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.19 vs. limit=15.0 2024-08-11 17:32:10,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1210580.0, ans=0.1 2024-08-11 17:32:18,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1210580.0, ans=0.125 2024-08-11 17:32:20,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1210580.0, ans=0.1 2024-08-11 17:32:20,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1210580.0, ans=0.125 2024-08-11 17:32:31,907 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 17:32:41,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1210780.0, ans=0.09899494936611666 2024-08-11 17:32:52,757 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 17:32:55,103 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5150, loss[loss=0.1193, beats_loss=0.01021, ecapa_loss=0.0001878, whisper_loss=0.1072, over 23507.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01133, ecapa_loss=0.0001907, whisper_loss=0.09306, over 3887404.16 frames. ], batch size: 92, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:33:17,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1210980.0, ans=0.0 2024-08-11 17:33:22,090 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.614e+01 3.081e+01 3.730e+01 5.554e+01, threshold=6.161e+01, percent-clipped=0.0 2024-08-11 17:33:24,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1210980.0, ans=0.125 2024-08-11 17:33:30,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1211080.0, ans=0.125 2024-08-11 17:33:39,173 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 17:34:04,703 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 17:34:08,139 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.606e-01 2024-08-11 17:34:11,832 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5200, loss[loss=0.1007, beats_loss=0.009944, ecapa_loss=0.0001867, whisper_loss=0.08894, over 13647.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01127, ecapa_loss=0.0001913, whisper_loss=0.0929, over 3893901.87 frames. ], batch size: 54, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:34:18,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1211380.0, ans=0.09899494936611666 2024-08-11 17:34:22,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1211380.0, ans=0.1 2024-08-11 17:34:27,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1211480.0, ans=0.1 2024-08-11 17:34:35,129 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 17:34:44,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1211580.0, ans=0.0 2024-08-11 17:34:58,267 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 17:35:15,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1211780.0, ans=0.125 2024-08-11 17:35:26,422 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.64 vs. limit=15.0 2024-08-11 17:35:29,834 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5250, loss[loss=0.09884, beats_loss=0.01186, ecapa_loss=0.0001499, whisper_loss=0.08548, over 15023.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01125, ecapa_loss=0.0001912, whisper_loss=0.09279, over 3856306.01 frames. ], batch size: 57, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:35:45,036 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.92 vs. limit=6.0 2024-08-11 17:35:51,195 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-11 17:35:57,398 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.672e+01 3.061e+01 3.448e+01 6.321e+01, threshold=6.122e+01, percent-clipped=2.0 2024-08-11 17:35:59,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1211980.0, ans=0.07 2024-08-11 17:36:05,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=1212080.0, ans=15.0 2024-08-11 17:36:34,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1212280.0, ans=0.2 2024-08-11 17:36:42,375 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2024-08-11 17:36:48,075 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5300, loss[loss=0.1191, beats_loss=0.008519, ecapa_loss=0.0002343, whisper_loss=0.1082, over 22545.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01124, ecapa_loss=0.0001901, whisper_loss=0.09295, over 3859487.26 frames. ], batch size: 91, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:36:50,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1212380.0, ans=10.0 2024-08-11 17:36:54,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1212380.0, ans=0.2 2024-08-11 17:37:02,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1212480.0, ans=0.125 2024-08-11 17:37:20,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1212580.0, ans=0.125 2024-08-11 17:37:43,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1212680.0, ans=0.0 2024-08-11 17:37:46,703 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.23 vs. limit=22.5 2024-08-11 17:38:04,604 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5350, loss[loss=0.1177, beats_loss=0.009887, ecapa_loss=0.0002023, whisper_loss=0.1058, over 22548.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01126, ecapa_loss=0.0001903, whisper_loss=0.09249, over 3889908.38 frames. ], batch size: 88, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:38:15,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1212880.0, ans=0.125 2024-08-11 17:38:23,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1212980.0, ans=0.1 2024-08-11 17:38:29,923 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.524e+01 2.904e+01 3.271e+01 6.276e+01, threshold=5.808e+01, percent-clipped=1.0 2024-08-11 17:38:32,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=1212980.0, ans=0.2 2024-08-11 17:38:44,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1213080.0, ans=0.07 2024-08-11 17:38:53,900 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.75 vs. limit=12.0 2024-08-11 17:39:14,399 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 17:39:25,624 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5400, loss[loss=0.1197, beats_loss=0.01027, ecapa_loss=0.0002245, whisper_loss=0.1072, over 20690.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01126, ecapa_loss=0.0001899, whisper_loss=0.09294, over 3915805.77 frames. ], batch size: 87, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:39:41,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1213480.0, ans=0.125 2024-08-11 17:40:31,887 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-11 17:40:41,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1213780.0, ans=0.125 2024-08-11 17:40:42,691 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-11 17:40:44,025 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5450, loss[loss=0.0924, beats_loss=0.0114, ecapa_loss=0.0002223, whisper_loss=0.07878, over 21342.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01123, ecapa_loss=0.0001902, whisper_loss=0.09308, over 3905053.67 frames. ], batch size: 91, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:40:52,153 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 17:40:55,407 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 17:41:11,875 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+01 2.638e+01 2.966e+01 3.379e+01 5.199e+01, threshold=5.933e+01, percent-clipped=0.0 2024-08-11 17:41:19,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1214080.0, ans=0.95 2024-08-11 17:41:24,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1214080.0, ans=0.0 2024-08-11 17:41:32,211 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.65 vs. limit=22.5 2024-08-11 17:41:34,776 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-11 17:41:53,313 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.76 vs. limit=22.5 2024-08-11 17:41:54,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1214280.0, ans=0.07 2024-08-11 17:41:56,209 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2024-08-11 17:41:58,318 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.06 vs. limit=22.5 2024-08-11 17:42:03,418 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5500, loss[loss=0.1019, beats_loss=0.0109, ecapa_loss=0.0002211, whisper_loss=0.08881, over 14767.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01126, ecapa_loss=0.0001903, whisper_loss=0.09253, over 3860022.66 frames. ], batch size: 59, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:42:12,245 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-11 17:42:25,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1214480.0, ans=0.125 2024-08-11 17:42:36,566 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-11 17:42:56,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1214680.0, ans=0.125 2024-08-11 17:42:59,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1214680.0, ans=0.125 2024-08-11 17:43:00,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1214680.0, ans=0.2 2024-08-11 17:43:10,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1214780.0, ans=0.1 2024-08-11 17:43:17,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1214780.0, ans=0.125 2024-08-11 17:43:20,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1214780.0, ans=0.125 2024-08-11 17:43:25,412 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5550, loss[loss=0.1027, beats_loss=0.01244, ecapa_loss=0.0001721, whisper_loss=0.08856, over 22941.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01127, ecapa_loss=0.0001908, whisper_loss=0.09272, over 3896368.38 frames. ], batch size: 92, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:43:35,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1214880.0, ans=0.0 2024-08-11 17:43:42,765 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 17:43:49,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1214980.0, ans=0.1 2024-08-11 17:43:53,715 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.662e+01 2.905e+01 3.480e+01 6.680e+01, threshold=5.810e+01, percent-clipped=1.0 2024-08-11 17:43:58,013 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 34 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 17:44:12,274 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2024-08-11 17:44:19,076 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 17:44:19,419 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.468e+05 2024-08-11 17:44:28,318 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-11 17:44:33,188 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-11 17:44:38,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1215280.0, ans=0.0 2024-08-11 17:44:41,954 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 17:44:46,485 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5600, loss[loss=0.09818, beats_loss=0.01184, ecapa_loss=0.000205, whisper_loss=0.08429, over 18570.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01124, ecapa_loss=0.0001919, whisper_loss=0.09297, over 3885215.31 frames. ], batch size: 80, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:44:56,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1215380.0, ans=0.07 2024-08-11 17:45:11,622 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 17:45:13,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1215480.0, ans=0.125 2024-08-11 17:45:26,685 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.55 vs. limit=15.0 2024-08-11 17:45:30,469 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-11 17:45:38,859 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 17:45:56,484 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 17:46:05,607 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5650, loss[loss=0.1085, beats_loss=0.01082, ecapa_loss=0.0001805, whisper_loss=0.09586, over 20538.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01128, ecapa_loss=0.0001921, whisper_loss=0.09235, over 3909244.23 frames. ], batch size: 81, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:46:06,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1215880.0, ans=0.125 2024-08-11 17:46:31,934 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.596e+01 3.008e+01 3.518e+01 5.757e+01, threshold=6.016e+01, percent-clipped=0.0 2024-08-11 17:46:51,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1216180.0, ans=0.1 2024-08-11 17:47:01,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1216180.0, ans=0.07 2024-08-11 17:47:03,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1216180.0, ans=0.125 2024-08-11 17:47:08,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1216280.0, ans=0.0 2024-08-11 17:47:22,673 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5700, loss[loss=0.08886, beats_loss=0.01462, ecapa_loss=0.0001754, whisper_loss=0.07249, over 20106.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01128, ecapa_loss=0.0001941, whisper_loss=0.09226, over 3930385.04 frames. ], batch size: 84, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:47:34,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1216380.0, ans=0.125 2024-08-11 17:47:46,110 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2024-08-11 17:47:48,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1216480.0, ans=0.0 2024-08-11 17:47:55,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1216580.0, ans=0.125 2024-08-11 17:48:17,462 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 17:48:21,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1216680.0, ans=0.0 2024-08-11 17:48:28,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1216780.0, ans=0.125 2024-08-11 17:48:36,096 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 17:48:38,017 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 28 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-11 17:48:42,709 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5750, loss[loss=0.1083, beats_loss=0.01046, ecapa_loss=0.0001807, whisper_loss=0.09604, over 14588.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0113, ecapa_loss=0.0001942, whisper_loss=0.09204, over 3919570.22 frames. ], batch size: 57, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:49:05,551 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2024-08-11 17:49:08,951 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.576e+01 2.968e+01 3.290e+01 6.597e+01, threshold=5.936e+01, percent-clipped=1.0 2024-08-11 17:49:14,758 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.36 vs. limit=22.5 2024-08-11 17:49:17,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1217080.0, ans=0.0 2024-08-11 17:49:41,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1217180.0, ans=0.125 2024-08-11 17:49:47,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1217280.0, ans=0.0 2024-08-11 17:49:57,754 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 17:50:00,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2024-08-11 17:50:00,554 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5800, loss[loss=0.113, beats_loss=0.0128, ecapa_loss=0.0001987, whisper_loss=0.09818, over 21733.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01133, ecapa_loss=0.0001948, whisper_loss=0.09206, over 3926439.91 frames. ], batch size: 89, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:50:14,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1217480.0, ans=0.1 2024-08-11 17:50:38,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1217580.0, ans=0.1 2024-08-11 17:50:48,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1217680.0, ans=0.125 2024-08-11 17:51:02,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1217780.0, ans=0.125 2024-08-11 17:51:15,037 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5850, loss[loss=0.1103, beats_loss=0.01099, ecapa_loss=0.0001793, whisper_loss=0.09754, over 19674.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01142, ecapa_loss=0.0001934, whisper_loss=0.091, over 3906634.61 frames. ], batch size: 80, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:51:39,916 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.535e+01 2.906e+01 3.221e+01 4.693e+01, threshold=5.811e+01, percent-clipped=0.0 2024-08-11 17:51:42,988 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 17:51:47,350 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-11 17:51:49,382 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-11 17:52:01,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1218180.0, ans=0.125 2024-08-11 17:52:05,583 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-11 17:52:09,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1218180.0, ans=0.1 2024-08-11 17:52:17,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1218280.0, ans=0.125 2024-08-11 17:52:21,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1218280.0, ans=0.0 2024-08-11 17:52:29,310 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5900, loss[loss=0.09624, beats_loss=0.01097, ecapa_loss=0.0002421, whisper_loss=0.08285, over 17995.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01146, ecapa_loss=0.000193, whisper_loss=0.09092, over 3901767.64 frames. ], batch size: 75, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:52:34,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1218380.0, ans=0.0 2024-08-11 17:52:39,135 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-11 17:53:34,044 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2024-08-11 17:53:35,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1218780.0, ans=0.125 2024-08-11 17:53:36,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1218780.0, ans=0.04949747468305833 2024-08-11 17:53:47,883 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 5950, loss[loss=0.08154, beats_loss=0.01246, ecapa_loss=0.0002163, whisper_loss=0.06692, over 15211.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01154, ecapa_loss=0.0001916, whisper_loss=0.09024, over 3876658.21 frames. ], batch size: 63, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:53:53,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1218880.0, ans=0.125 2024-08-11 17:53:56,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1218880.0, ans=0.1 2024-08-11 17:53:56,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1218880.0, ans=0.0 2024-08-11 17:54:13,667 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.543e+01 2.844e+01 3.292e+01 4.976e+01, threshold=5.688e+01, percent-clipped=0.0 2024-08-11 17:54:23,062 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 17:54:26,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1219080.0, ans=0.1 2024-08-11 17:54:29,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-11 17:54:39,409 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 19 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 17:54:48,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1219280.0, ans=0.125 2024-08-11 17:55:00,491 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 17:55:02,666 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.85 vs. limit=22.5 2024-08-11 17:55:03,127 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6000, loss[loss=0.127, beats_loss=0.009148, ecapa_loss=0.0001727, whisper_loss=0.1161, over 22253.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01145, ecapa_loss=0.0001913, whisper_loss=0.09124, over 3859719.90 frames. ], batch size: 84, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:55:03,128 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 17:55:39,265 INFO [train_multi_KD3.py:1149] (2/4) Epoch 9, validation on ASR_libri: loss=0.2573, beats_loss=0, ecapa_loss=0.0006361, whisper_loss=0.2509, over 922467.00 frames. 2024-08-11 17:55:57,631 INFO [train_multi_KD3.py:1149] (2/4) Epoch 9, validation on SV_voxceleb1: loss=0.005086, beats_loss=0, ecapa_loss=0.0005086, whisper_loss=0, over 939242.00 frames. 2024-08-11 17:57:42,106 INFO [train_multi_KD3.py:1149] (2/4) Epoch 9, validation on AT_audioset: loss=0.02513, beats_loss=0.02513, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 17:57:42,111 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 17:58:00,457 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2024-08-11 17:58:02,957 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 17:58:08,366 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=12.0 2024-08-11 17:58:26,079 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 17:58:26,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1219580.0, ans=0.125 2024-08-11 17:58:27,998 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 17:58:31,056 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-11 17:58:56,732 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 41 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 17:58:58,789 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-11 17:58:59,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1219780.0, ans=0.0 2024-08-11 17:59:06,886 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6050, loss[loss=0.1363, beats_loss=0.009036, ecapa_loss=0.0002117, whisper_loss=0.1251, over 22858.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01131, ecapa_loss=0.0001919, whisper_loss=0.09236, over 3869395.42 frames. ], batch size: 89, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:59:11,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1219880.0, ans=0.07 2024-08-11 17:59:25,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1219980.0, ans=0.125 2024-08-11 17:59:34,513 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.570e+01 2.877e+01 3.382e+01 4.916e+01, threshold=5.754e+01, percent-clipped=0.0 2024-08-11 17:59:39,529 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-11 17:59:49,496 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 17:59:51,400 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 18:00:06,205 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-11 18:00:29,167 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6100, loss[loss=0.1127, beats_loss=0.01073, ecapa_loss=0.0001956, whisper_loss=0.1, over 22860.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01135, ecapa_loss=0.0001919, whisper_loss=0.09249, over 3887720.29 frames. ], batch size: 89, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:00:31,308 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-11 18:00:36,866 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 18:01:02,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1220580.0, ans=0.05 2024-08-11 18:01:07,506 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 18:01:20,266 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 18:01:47,970 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-11 18:01:52,821 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6150, loss[loss=0.09816, beats_loss=0.01336, ecapa_loss=0.0001739, whisper_loss=0.08305, over 22214.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01142, ecapa_loss=0.0001916, whisper_loss=0.09167, over 3892950.86 frames. ], batch size: 93, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:01:53,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1220880.0, ans=0.125 2024-08-11 18:02:07,326 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-11 18:02:07,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1220980.0, ans=0.125 2024-08-11 18:02:08,166 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.38 vs. limit=15.0 2024-08-11 18:02:09,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1220980.0, ans=0.0 2024-08-11 18:02:12,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1220980.0, ans=0.2 2024-08-11 18:02:19,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1220980.0, ans=0.1 2024-08-11 18:02:20,166 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+01 2.665e+01 2.922e+01 3.415e+01 6.689e+01, threshold=5.844e+01, percent-clipped=1.0 2024-08-11 18:02:24,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1221080.0, ans=0.125 2024-08-11 18:02:27,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1221080.0, ans=0.0 2024-08-11 18:02:45,906 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 18:03:08,002 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2024-08-11 18:03:11,536 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6200, loss[loss=0.1108, beats_loss=0.01278, ecapa_loss=0.0001897, whisper_loss=0.09613, over 23604.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01134, ecapa_loss=0.0001916, whisper_loss=0.09233, over 3886240.74 frames. ], batch size: 94, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:03:14,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1221380.0, ans=0.1 2024-08-11 18:03:14,431 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.95 vs. limit=15.0 2024-08-11 18:03:21,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1221380.0, ans=0.0 2024-08-11 18:03:31,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1221480.0, ans=0.1 2024-08-11 18:03:34,051 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 18:03:37,443 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-11 18:03:38,803 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 18:03:58,524 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2024-08-11 18:04:01,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1221680.0, ans=0.1 2024-08-11 18:04:13,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1221680.0, ans=0.125 2024-08-11 18:04:16,728 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 18:04:31,912 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6250, loss[loss=0.09279, beats_loss=0.0117, ecapa_loss=0.0002042, whisper_loss=0.07905, over 16450.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01131, ecapa_loss=0.0001914, whisper_loss=0.09324, over 3889476.01 frames. ], batch size: 68, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:04:35,513 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 18:04:42,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1221880.0, ans=0.07 2024-08-11 18:04:56,013 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 18:04:58,888 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+01 2.590e+01 2.864e+01 3.315e+01 6.460e+01, threshold=5.728e+01, percent-clipped=1.0 2024-08-11 18:05:04,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1222080.0, ans=0.125 2024-08-11 18:05:21,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1222180.0, ans=0.125 2024-08-11 18:05:24,474 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-11 18:05:52,348 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6300, loss[loss=0.1083, beats_loss=0.01151, ecapa_loss=0.0001895, whisper_loss=0.09492, over 21052.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01126, ecapa_loss=0.0001913, whisper_loss=0.09334, over 3884749.73 frames. ], batch size: 85, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:06:36,240 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-11 18:06:52,805 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 18:06:55,119 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 18 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 18:07:02,242 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-11 18:07:35,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1222780.0, ans=0.1 2024-08-11 18:07:46,301 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6350, loss[loss=0.1254, beats_loss=0.01021, ecapa_loss=0.0002447, whisper_loss=0.1127, over 22882.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01135, ecapa_loss=0.000191, whisper_loss=0.09261, over 3873050.98 frames. ], batch size: 94, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:08:03,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1222980.0, ans=0.025 2024-08-11 18:08:17,756 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.576e+01 2.972e+01 3.431e+01 4.977e+01, threshold=5.945e+01, percent-clipped=0.0 2024-08-11 18:08:25,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1223080.0, ans=0.125 2024-08-11 18:08:41,418 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 31 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 18:08:56,918 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=15.0 2024-08-11 18:09:00,718 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 18:09:18,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1223280.0, ans=0.125 2024-08-11 18:09:21,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1223280.0, ans=0.1 2024-08-11 18:09:32,243 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6400, loss[loss=0.113, beats_loss=0.01081, ecapa_loss=0.000184, whisper_loss=0.1004, over 23041.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01139, ecapa_loss=0.0001899, whisper_loss=0.09263, over 3911925.99 frames. ], batch size: 92, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:09:41,107 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 18:10:05,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1223480.0, ans=0.125 2024-08-11 18:10:22,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1223580.0, ans=0.125 2024-08-11 18:10:44,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1223680.0, ans=0.125 2024-08-11 18:10:51,758 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.88 vs. limit=22.5 2024-08-11 18:11:23,725 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6450, loss[loss=0.08941, beats_loss=0.01009, ecapa_loss=0.0001786, whisper_loss=0.07753, over 14028.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01137, ecapa_loss=0.0001915, whisper_loss=0.09281, over 3933497.29 frames. ], batch size: 53, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:11:34,492 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 18:11:35,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1223880.0, ans=0.1 2024-08-11 18:11:44,890 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 18:11:57,481 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 18:11:58,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1223980.0, ans=0.125 2024-08-11 18:12:00,910 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 18:12:08,049 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.662e+01 3.047e+01 3.508e+01 5.395e+01, threshold=6.093e+01, percent-clipped=0.0 2024-08-11 18:12:08,760 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2024-08-11 18:13:12,950 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-08-11 18:13:26,594 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6500, loss[loss=0.1007, beats_loss=0.01195, ecapa_loss=0.0001796, whisper_loss=0.08692, over 23189.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01132, ecapa_loss=0.0001912, whisper_loss=0.09292, over 3923605.65 frames. ], batch size: 96, lr: 7.04e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:13:39,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1224380.0, ans=0.0 2024-08-11 18:14:53,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1224680.0, ans=0.0 2024-08-11 18:14:57,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1224680.0, ans=0.125 2024-08-11 18:15:01,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1224780.0, ans=0.5 2024-08-11 18:15:06,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1224780.0, ans=0.125 2024-08-11 18:15:24,750 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6550, loss[loss=0.09747, beats_loss=0.01157, ecapa_loss=0.0001748, whisper_loss=0.08415, over 23437.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01136, ecapa_loss=0.0001916, whisper_loss=0.09295, over 3913861.34 frames. ], batch size: 94, lr: 7.04e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:15:36,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1224880.0, ans=0.0 2024-08-11 18:15:43,156 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 18:15:43,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1224880.0, ans=0.2 2024-08-11 18:16:06,723 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+01 2.812e+01 3.232e+01 4.010e+01 5.660e+01, threshold=6.463e+01, percent-clipped=0.0 2024-08-11 18:16:17,408 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2024-08-11 18:16:27,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1225080.0, ans=15.0 2024-08-11 18:16:35,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1225180.0, ans=0.2 2024-08-11 18:16:39,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1225180.0, ans=0.125 2024-08-11 18:16:50,321 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-11 18:16:56,108 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 18:16:57,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1225280.0, ans=0.0 2024-08-11 18:17:05,035 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6600, loss[loss=0.115, beats_loss=0.01032, ecapa_loss=0.0001971, whisper_loss=0.1028, over 18808.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01132, ecapa_loss=0.0001944, whisper_loss=0.09307, over 3916548.79 frames. ], batch size: 72, lr: 7.04e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:17:17,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1225380.0, ans=0.125 2024-08-11 18:17:28,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1225480.0, ans=0.025 2024-08-11 18:17:29,636 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-11 18:17:32,531 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 18:17:33,936 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 18:18:08,189 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 18:18:10,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1225680.0, ans=0.0 2024-08-11 18:18:33,889 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6650, loss[loss=0.1085, beats_loss=0.01396, ecapa_loss=0.0001508, whisper_loss=0.09308, over 23814.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01125, ecapa_loss=0.0001953, whisper_loss=0.0933, over 3894552.35 frames. ], batch size: 93, lr: 7.04e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:18:55,689 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.37 vs. limit=15.0 2024-08-11 18:19:02,892 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.730e+01 3.226e+01 3.856e+01 7.096e+01, threshold=6.452e+01, percent-clipped=1.0 2024-08-11 18:19:29,536 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 18:19:40,255 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 18:20:00,213 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6700, loss[loss=0.09907, beats_loss=0.01372, ecapa_loss=0.0001596, whisper_loss=0.08375, over 17351.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01127, ecapa_loss=0.0001954, whisper_loss=0.09359, over 3884520.79 frames. ], batch size: 69, lr: 7.04e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:20:03,924 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 18:20:09,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1226380.0, ans=0.0 2024-08-11 18:20:26,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1226480.0, ans=0.125 2024-08-11 18:20:41,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1226580.0, ans=0.125 2024-08-11 18:20:50,409 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 18:20:59,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1226680.0, ans=0.125 2024-08-11 18:21:01,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=1226680.0, ans=15.0 2024-08-11 18:21:08,058 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 39 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 18:21:20,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1226780.0, ans=0.125 2024-08-11 18:21:25,158 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6750, loss[loss=0.1101, beats_loss=0.01212, ecapa_loss=0.0001504, whisper_loss=0.09646, over 21210.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01129, ecapa_loss=0.000195, whisper_loss=0.09416, over 3913492.59 frames. ], batch size: 83, lr: 7.04e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:21:25,356 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 18:21:37,021 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 18:21:37,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1226880.0, ans=0.1 2024-08-11 18:21:39,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1226880.0, ans=0.1 2024-08-11 18:21:39,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1226880.0, ans=0.0 2024-08-11 18:21:42,096 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.20 vs. limit=15.0 2024-08-11 18:21:46,483 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 18:21:57,051 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.725e+01 3.041e+01 3.593e+01 5.305e+01, threshold=6.083e+01, percent-clipped=0.0 2024-08-11 18:21:58,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1226980.0, ans=0.1 2024-08-11 18:22:03,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1227080.0, ans=0.0 2024-08-11 18:22:13,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1227080.0, ans=0.125 2024-08-11 18:22:13,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1227080.0, ans=0.125 2024-08-11 18:22:21,913 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 18:22:23,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1227180.0, ans=0.125 2024-08-11 18:22:38,700 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2024-08-11 18:22:41,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1227280.0, ans=0.0 2024-08-11 18:22:49,761 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 15 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 18:22:52,582 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6800, loss[loss=0.1179, beats_loss=0.008995, ecapa_loss=0.0002512, whisper_loss=0.1064, over 22145.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01128, ecapa_loss=0.0001957, whisper_loss=0.09378, over 3929232.34 frames. ], batch size: 91, lr: 7.04e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:22:57,369 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 18:23:16,241 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.98 vs. limit=15.0 2024-08-11 18:23:16,905 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 18:23:20,405 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 18:23:45,023 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2024-08-11 18:24:04,717 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2024-08-11 18:24:05,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1227780.0, ans=0.125 2024-08-11 18:24:05,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1227780.0, ans=0.125 2024-08-11 18:24:18,908 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-11 18:24:20,012 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6850, loss[loss=0.113, beats_loss=0.01236, ecapa_loss=0.0001629, whisper_loss=0.09902, over 16866.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01124, ecapa_loss=0.0001949, whisper_loss=0.09318, over 3898560.74 frames. ], batch size: 64, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:24:30,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1227880.0, ans=0.0 2024-08-11 18:24:52,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.557e+01 2.801e+01 3.138e+01 4.430e+01, threshold=5.603e+01, percent-clipped=0.0 2024-08-11 18:25:05,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1228080.0, ans=0.125 2024-08-11 18:25:12,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1228080.0, ans=0.09899494936611666 2024-08-11 18:25:13,492 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2024-08-11 18:25:24,131 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.23 vs. limit=6.0 2024-08-11 18:25:36,465 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2024-08-11 18:25:49,847 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 18:25:51,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1228380.0, ans=0.1 2024-08-11 18:25:52,439 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6900, loss[loss=0.09016, beats_loss=0.01137, ecapa_loss=0.0002021, whisper_loss=0.07677, over 21505.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01127, ecapa_loss=0.0001929, whisper_loss=0.09295, over 3864998.31 frames. ], batch size: 90, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:25:55,333 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 18:26:01,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1228380.0, ans=0.0 2024-08-11 18:26:23,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1228480.0, ans=0.125 2024-08-11 18:26:40,576 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 18:26:45,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1228680.0, ans=0.0 2024-08-11 18:27:23,259 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 6950, loss[loss=0.1249, beats_loss=0.009455, ecapa_loss=0.0001885, whisper_loss=0.1136, over 20473.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01127, ecapa_loss=0.0001926, whisper_loss=0.09312, over 3839463.08 frames. ], batch size: 79, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:27:35,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1228880.0, ans=0.125 2024-08-11 18:27:38,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1228880.0, ans=0.125 2024-08-11 18:27:56,162 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.618e+01 3.004e+01 3.400e+01 5.942e+01, threshold=6.008e+01, percent-clipped=1.0 2024-08-11 18:27:57,104 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.85 vs. limit=15.0 2024-08-11 18:27:59,794 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 18:28:14,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1229080.0, ans=0.125 2024-08-11 18:28:16,942 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.20 vs. limit=12.0 2024-08-11 18:28:20,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1229180.0, ans=0.5 2024-08-11 18:28:21,944 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-11 18:28:40,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1229280.0, ans=0.05 2024-08-11 18:28:54,278 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7000, loss[loss=0.1197, beats_loss=0.01132, ecapa_loss=0.0001832, whisper_loss=0.1065, over 23231.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01115, ecapa_loss=0.0001937, whisper_loss=0.09371, over 3842197.20 frames. ], batch size: 91, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:28:54,855 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-11 18:28:58,547 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 18:29:21,931 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 29 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 18:29:30,261 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 13 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 18:29:44,377 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.20 vs. limit=22.5 2024-08-11 18:29:55,028 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 18:30:07,012 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 18:30:23,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7050, loss[loss=0.1195, beats_loss=0.009094, ecapa_loss=0.0002592, whisper_loss=0.1078, over 21408.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01119, ecapa_loss=0.0001941, whisper_loss=0.09362, over 3885407.71 frames. ], batch size: 92, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:30:35,144 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 18:30:47,423 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 18:30:49,469 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 18:30:53,416 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 19 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-11 18:30:54,777 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.713e+01 3.050e+01 3.555e+01 6.661e+01, threshold=6.100e+01, percent-clipped=2.0 2024-08-11 18:30:55,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1229980.0, ans=0.125 2024-08-11 18:31:02,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1230080.0, ans=0.1 2024-08-11 18:31:08,092 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.65 vs. limit=22.5 2024-08-11 18:31:43,818 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=22.5 2024-08-11 18:31:52,612 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7100, loss[loss=0.09017, beats_loss=0.01097, ecapa_loss=0.0001754, whisper_loss=0.07745, over 14546.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01121, ecapa_loss=0.0001918, whisper_loss=0.0936, over 3871751.91 frames. ], batch size: 57, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:31:57,610 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 18:32:00,981 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 18:32:01,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1230380.0, ans=0.125 2024-08-11 18:32:16,586 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2024-08-11 18:32:38,300 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 18:32:43,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1230680.0, ans=0.125 2024-08-11 18:32:53,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1230680.0, ans=0.125 2024-08-11 18:33:05,012 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2024-08-11 18:33:08,057 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 18:33:17,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1230780.0, ans=0.125 2024-08-11 18:33:21,479 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7150, loss[loss=0.09535, beats_loss=0.01227, ecapa_loss=0.0001824, whisper_loss=0.08125, over 23265.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01122, ecapa_loss=0.0001914, whisper_loss=0.09365, over 3897220.87 frames. ], batch size: 93, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:33:51,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1230980.0, ans=0.125 2024-08-11 18:33:54,877 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.688e+01 3.029e+01 3.368e+01 5.006e+01, threshold=6.058e+01, percent-clipped=0.0 2024-08-11 18:33:55,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1230980.0, ans=0.0 2024-08-11 18:34:46,327 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2024-08-11 18:34:55,373 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7200, loss[loss=0.09857, beats_loss=0.01175, ecapa_loss=0.0002152, whisper_loss=0.08467, over 14213.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01115, ecapa_loss=0.0001915, whisper_loss=0.09428, over 3912199.64 frames. ], batch size: 61, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:35:44,659 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 20 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-11 18:36:15,017 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 18:36:19,525 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 18:36:19,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1231780.0, ans=15.0 2024-08-11 18:36:21,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1231880.0, ans=0.2 2024-08-11 18:36:21,414 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=22.5 2024-08-11 18:36:21,817 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7250, loss[loss=0.111, beats_loss=0.01357, ecapa_loss=0.0001782, whisper_loss=0.09568, over 22520.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01116, ecapa_loss=0.0001923, whisper_loss=0.094, over 3909012.11 frames. ], batch size: 91, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:36:31,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1231880.0, ans=0.0 2024-08-11 18:36:36,032 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.04 vs. limit=22.5 2024-08-11 18:36:37,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1231980.0, ans=0.0 2024-08-11 18:36:39,202 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.559e+02 2024-08-11 18:36:49,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1231980.0, ans=0.0 2024-08-11 18:36:51,915 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.618e+01 2.954e+01 3.399e+01 5.489e+01, threshold=5.908e+01, percent-clipped=0.0 2024-08-11 18:37:01,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1232080.0, ans=0.0 2024-08-11 18:37:04,838 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.43 vs. limit=15.0 2024-08-11 18:37:27,184 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 18:37:31,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1232280.0, ans=0.125 2024-08-11 18:37:43,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1232380.0, ans=0.1 2024-08-11 18:37:45,502 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7300, loss[loss=0.1188, beats_loss=0.00995, ecapa_loss=0.0001885, whisper_loss=0.1069, over 21790.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0112, ecapa_loss=0.0001922, whisper_loss=0.09288, over 3848074.84 frames. ], batch size: 88, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:37:49,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1232380.0, ans=0.125 2024-08-11 18:37:51,807 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.71 vs. limit=8.0 2024-08-11 18:37:58,183 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-11 18:38:29,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1232580.0, ans=0.125 2024-08-11 18:38:40,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1232680.0, ans=0.05 2024-08-11 18:38:47,762 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-11 18:39:03,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1232780.0, ans=0.125 2024-08-11 18:39:09,644 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7350, loss[loss=0.1037, beats_loss=0.01247, ecapa_loss=0.0002116, whisper_loss=0.08914, over 22048.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01122, ecapa_loss=0.0001925, whisper_loss=0.09254, over 3859930.67 frames. ], batch size: 93, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:39:11,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1232880.0, ans=0.0 2024-08-11 18:39:26,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1232980.0, ans=0.0 2024-08-11 18:39:37,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1232980.0, ans=0.1 2024-08-11 18:39:38,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1232980.0, ans=0.0 2024-08-11 18:39:39,041 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.552e+01 3.033e+01 3.374e+01 5.510e+01, threshold=6.067e+01, percent-clipped=0.0 2024-08-11 18:39:40,672 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.95 vs. limit=10.0 2024-08-11 18:39:46,559 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 18:39:49,080 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2024-08-11 18:40:06,524 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 18:40:26,992 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 29 from Vox, 22 fro AS 2024-08-11 18:40:32,596 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7400, loss[loss=0.1261, beats_loss=0.01018, ecapa_loss=0.0001813, whisper_loss=0.1141, over 20147.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01121, ecapa_loss=0.0001924, whisper_loss=0.09294, over 3893026.00 frames. ], batch size: 79, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:40:34,767 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 18:41:01,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1233480.0, ans=0.1 2024-08-11 18:41:04,746 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 13 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-11 18:41:07,517 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-11 18:41:18,156 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-11 18:41:20,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1233580.0, ans=0.125 2024-08-11 18:41:20,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1233580.0, ans=0.1 2024-08-11 18:41:21,076 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 23 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-11 18:41:30,035 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 18:41:31,442 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 18:41:37,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1233680.0, ans=0.125 2024-08-11 18:41:38,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1233780.0, ans=0.0 2024-08-11 18:41:55,587 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7450, loss[loss=0.1283, beats_loss=0.007888, ecapa_loss=0.0001834, whisper_loss=0.1186, over 15303.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01117, ecapa_loss=0.0001928, whisper_loss=0.09296, over 3872161.02 frames. ], batch size: 55, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:42:03,644 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.36 vs. limit=22.5 2024-08-11 18:42:07,422 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-08-11 18:42:21,470 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-11 18:42:28,078 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.285e+01 2.705e+01 3.012e+01 3.463e+01 6.106e+01, threshold=6.024e+01, percent-clipped=1.0 2024-08-11 18:42:49,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1234180.0, ans=0.125 2024-08-11 18:43:07,019 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 18:43:21,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1234380.0, ans=0.0 2024-08-11 18:43:22,647 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7500, loss[loss=0.09153, beats_loss=0.01382, ecapa_loss=0.0002333, whisper_loss=0.07538, over 13180.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01126, ecapa_loss=0.0001916, whisper_loss=0.09273, over 3891910.13 frames. ], batch size: 58, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:43:27,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1234380.0, ans=0.125 2024-08-11 18:43:46,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1234480.0, ans=0.0 2024-08-11 18:43:52,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1234480.0, ans=0.025 2024-08-11 18:43:55,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1234580.0, ans=0.1 2024-08-11 18:44:04,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1234580.0, ans=0.125 2024-08-11 18:44:04,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1234580.0, ans=0.125 2024-08-11 18:44:06,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1234580.0, ans=0.0 2024-08-11 18:44:08,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1234580.0, ans=0.0 2024-08-11 18:44:29,799 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.77 vs. limit=5.0 2024-08-11 18:44:35,787 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=16.68 vs. limit=15.0 2024-08-11 18:44:38,640 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-11 18:44:44,422 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7550, loss[loss=0.1107, beats_loss=0.01224, ecapa_loss=0.0002123, whisper_loss=0.09636, over 19976.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01131, ecapa_loss=0.000192, whisper_loss=0.09269, over 3877885.16 frames. ], batch size: 85, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:44:46,085 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-11 18:44:51,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1234880.0, ans=0.125 2024-08-11 18:44:52,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1234880.0, ans=0.125 2024-08-11 18:45:02,670 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-11 18:45:12,812 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.591e+01 2.941e+01 3.490e+01 1.489e+02, threshold=5.883e+01, percent-clipped=2.0 2024-08-11 18:45:35,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1235180.0, ans=0.125 2024-08-11 18:45:47,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1235180.0, ans=0.0 2024-08-11 18:45:57,226 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 25 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-11 18:46:07,139 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7600, loss[loss=0.09862, beats_loss=0.01278, ecapa_loss=0.000178, whisper_loss=0.08406, over 22651.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01133, ecapa_loss=0.0001927, whisper_loss=0.09198, over 3882220.92 frames. ], batch size: 90, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:46:17,248 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.83 vs. limit=15.0 2024-08-11 18:46:28,493 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-11 18:46:42,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1235580.0, ans=0.1 2024-08-11 18:46:43,290 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 19 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-11 18:46:59,271 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 18:47:00,820 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 18:47:05,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1235680.0, ans=0.125 2024-08-11 18:47:19,511 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 18:47:25,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1235780.0, ans=0.2 2024-08-11 18:47:31,449 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 18:47:33,154 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-11 18:47:34,255 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7650, loss[loss=0.1026, beats_loss=0.008448, ecapa_loss=0.0002077, whisper_loss=0.09212, over 17598.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01116, ecapa_loss=0.0001931, whisper_loss=0.09268, over 3883478.96 frames. ], batch size: 70, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:47:45,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=1235880.0, ans=0.1 2024-08-11 18:48:04,436 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.623e+01 3.033e+01 3.717e+01 6.248e+01, threshold=6.065e+01, percent-clipped=1.0 2024-08-11 18:48:20,946 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 18:48:21,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1236080.0, ans=0.125 2024-08-11 18:48:33,747 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 12 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 18:48:39,090 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 18:48:45,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1236280.0, ans=0.125 2024-08-11 18:48:54,778 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 18:48:55,109 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 18:48:58,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.97 vs. limit=22.5 2024-08-11 18:49:00,851 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7700, loss[loss=0.08878, beats_loss=0.01095, ecapa_loss=0.0001756, whisper_loss=0.07607, over 14568.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0112, ecapa_loss=0.0001921, whisper_loss=0.09196, over 3868233.49 frames. ], batch size: 57, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:49:03,205 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 18:49:06,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1236380.0, ans=0.125 2024-08-11 18:49:24,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1236480.0, ans=0.5 2024-08-11 18:49:28,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1236480.0, ans=0.125 2024-08-11 18:49:36,431 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.11 vs. limit=10.0 2024-08-11 18:49:46,174 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2024-08-11 18:49:58,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1236680.0, ans=0.2 2024-08-11 18:50:22,656 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7750, loss[loss=0.09505, beats_loss=0.0127, ecapa_loss=0.0001626, whisper_loss=0.08072, over 23996.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01118, ecapa_loss=0.0001911, whisper_loss=0.09256, over 3885979.83 frames. ], batch size: 94, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:50:38,041 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 18:50:43,999 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 18:50:44,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1236980.0, ans=0.0 2024-08-11 18:50:48,103 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-11 18:50:52,343 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.693e+01 2.903e+01 3.373e+01 1.168e+02, threshold=5.806e+01, percent-clipped=1.0 2024-08-11 18:51:00,423 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 18:51:23,914 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 18:51:31,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1237280.0, ans=0.1 2024-08-11 18:51:41,587 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7800, loss[loss=0.1048, beats_loss=0.009997, ecapa_loss=0.0002261, whisper_loss=0.09254, over 18183.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01119, ecapa_loss=0.0001916, whisper_loss=0.0926, over 3884281.30 frames. ], batch size: 76, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:51:45,388 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 18:51:53,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1237380.0, ans=0.09899494936611666 2024-08-11 18:52:01,104 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 18:52:19,205 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=12.0 2024-08-11 18:52:22,061 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-11 18:52:31,948 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.67 vs. limit=22.5 2024-08-11 18:52:37,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1237680.0, ans=0.125 2024-08-11 18:52:57,217 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7850, loss[loss=0.1066, beats_loss=0.01076, ecapa_loss=0.000178, whisper_loss=0.09405, over 22550.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01126, ecapa_loss=0.0001913, whisper_loss=0.09287, over 3896842.37 frames. ], batch size: 91, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:53:00,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2024-08-11 18:53:00,917 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-11 18:53:03,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1237880.0, ans=0.125 2024-08-11 18:53:13,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1237980.0, ans=0.125 2024-08-11 18:53:20,574 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 15 from Vox, 51 fro AS 2024-08-11 18:53:24,523 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.579e+01 2.865e+01 3.320e+01 8.816e+01, threshold=5.729e+01, percent-clipped=1.0 2024-08-11 18:53:29,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1238080.0, ans=0.025 2024-08-11 18:54:10,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1238280.0, ans=0.0 2024-08-11 18:54:13,079 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7900, loss[loss=0.09306, beats_loss=0.01255, ecapa_loss=0.000171, whisper_loss=0.0788, over 16529.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01132, ecapa_loss=0.0001899, whisper_loss=0.09294, over 3872316.90 frames. ], batch size: 68, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:54:16,226 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 18:54:26,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1238380.0, ans=0.125 2024-08-11 18:54:30,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1238480.0, ans=0.0 2024-08-11 18:54:35,790 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 18:54:43,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1238580.0, ans=0.0 2024-08-11 18:54:44,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1238580.0, ans=0.1 2024-08-11 18:54:54,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1238580.0, ans=0.125 2024-08-11 18:55:05,107 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.92 vs. limit=10.0 2024-08-11 18:55:15,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1238780.0, ans=0.0 2024-08-11 18:55:27,243 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 7950, loss[loss=0.1102, beats_loss=0.01068, ecapa_loss=0.0001783, whisper_loss=0.09774, over 19438.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01122, ecapa_loss=0.0001911, whisper_loss=0.09353, over 3864353.42 frames. ], batch size: 77, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:55:29,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1238880.0, ans=0.125 2024-08-11 18:55:52,650 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.748e+01 3.056e+01 3.459e+01 5.765e+01, threshold=6.112e+01, percent-clipped=1.0 2024-08-11 18:55:54,231 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 18:55:59,890 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 24 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-11 18:56:26,626 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 18:56:29,123 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 18:56:37,559 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8000, loss[loss=0.1071, beats_loss=0.01015, ecapa_loss=0.0001784, whisper_loss=0.09517, over 16346.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01119, ecapa_loss=0.0001908, whisper_loss=0.09357, over 3852193.00 frames. ], batch size: 62, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:56:44,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1239380.0, ans=0.0 2024-08-11 18:56:52,386 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 18:57:03,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1239480.0, ans=0.1 2024-08-11 18:57:22,294 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-11 18:57:27,615 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.39 vs. limit=15.0 2024-08-11 18:57:36,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1239780.0, ans=0.0 2024-08-11 18:57:36,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1239780.0, ans=0.1 2024-08-11 18:57:43,334 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.49 vs. limit=22.5 2024-08-11 18:57:48,183 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8050, loss[loss=0.1063, beats_loss=0.01294, ecapa_loss=0.0001958, whisper_loss=0.09137, over 17458.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01113, ecapa_loss=0.0001909, whisper_loss=0.09384, over 3876456.65 frames. ], batch size: 73, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:58:07,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1239980.0, ans=0.125 2024-08-11 18:58:14,244 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.885e+01 3.265e+01 3.759e+01 1.907e+02, threshold=6.530e+01, percent-clipped=2.0 2024-08-11 18:58:56,004 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8100, loss[loss=0.0931, beats_loss=0.01385, ecapa_loss=0.0001866, whisper_loss=0.07738, over 22391.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01114, ecapa_loss=0.0001905, whisper_loss=0.09298, over 3853938.93 frames. ], batch size: 93, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:59:22,980 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-11 18:59:34,145 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.232e-02 2024-08-11 18:59:35,039 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-11 18:59:36,212 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 9 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 18:59:57,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1240780.0, ans=0.125 2024-08-11 19:00:01,542 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 38 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 19:00:02,581 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8150, loss[loss=0.1293, beats_loss=0.009264, ecapa_loss=0.0002049, whisper_loss=0.1179, over 22383.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01118, ecapa_loss=0.0001901, whisper_loss=0.0927, over 3875437.56 frames. ], batch size: 89, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:00:14,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1240880.0, ans=0.125 2024-08-11 19:00:24,319 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 19:00:26,618 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.542e+01 2.871e+01 3.241e+01 4.432e+01, threshold=5.742e+01, percent-clipped=0.0 2024-08-11 19:00:32,240 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-11 19:00:36,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1241080.0, ans=0.2 2024-08-11 19:00:50,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1241180.0, ans=0.125 2024-08-11 19:00:56,545 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.99 vs. limit=22.5 2024-08-11 19:01:08,801 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8200, loss[loss=0.1185, beats_loss=0.009461, ecapa_loss=0.0001807, whisper_loss=0.1072, over 23022.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0112, ecapa_loss=0.0001907, whisper_loss=0.09261, over 3902616.43 frames. ], batch size: 90, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:01:33,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1241580.0, ans=0.125 2024-08-11 19:01:36,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1241580.0, ans=0.125 2024-08-11 19:01:38,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1241580.0, ans=0.125 2024-08-11 19:01:39,291 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2024-08-11 19:01:42,582 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 21 from Vox, 15 fro AS 2024-08-11 19:01:56,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1241680.0, ans=0.0 2024-08-11 19:01:58,546 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 19:02:00,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1241780.0, ans=0.5 2024-08-11 19:02:08,034 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 19:02:13,280 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 31 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 19:02:14,346 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8250, loss[loss=0.121, beats_loss=0.009292, ecapa_loss=0.000207, whisper_loss=0.1096, over 21656.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01121, ecapa_loss=0.0001894, whisper_loss=0.09255, over 3899974.27 frames. ], batch size: 85, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:02:37,826 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.577e+01 2.823e+01 3.231e+01 7.611e+01, threshold=5.645e+01, percent-clipped=2.0 2024-08-11 19:02:42,699 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-11 19:02:45,870 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 20 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-11 19:02:49,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1242080.0, ans=0.2 2024-08-11 19:02:49,941 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 10 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 19:02:50,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1242080.0, ans=0.0 2024-08-11 19:02:50,476 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.80 vs. limit=10.0 2024-08-11 19:02:53,750 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 19:03:02,796 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 18 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 19:03:12,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1242280.0, ans=0.125 2024-08-11 19:03:19,944 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8300, loss[loss=0.09046, beats_loss=0.01331, ecapa_loss=0.0001773, whisper_loss=0.07538, over 22818.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01125, ecapa_loss=0.0001891, whisper_loss=0.09231, over 3898334.17 frames. ], batch size: 95, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:03:30,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1242380.0, ans=0.09899494936611666 2024-08-11 19:03:34,260 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.65 vs. limit=15.0 2024-08-11 19:03:39,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1242480.0, ans=0.125 2024-08-11 19:03:46,232 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 19:03:58,913 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.57 vs. limit=15.0 2024-08-11 19:04:08,732 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 19:04:12,110 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2024-08-11 19:04:17,850 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 19:04:25,413 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8350, loss[loss=0.09812, beats_loss=0.01174, ecapa_loss=0.0001849, whisper_loss=0.08453, over 19961.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01123, ecapa_loss=0.000191, whisper_loss=0.09269, over 3912485.70 frames. ], batch size: 78, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:04:30,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1242880.0, ans=0.125 2024-08-11 19:04:49,305 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.806e+01 3.050e+01 3.549e+01 1.399e+02, threshold=6.100e+01, percent-clipped=1.0 2024-08-11 19:04:53,293 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 19:04:58,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1243080.0, ans=0.125 2024-08-11 19:05:21,551 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-11 19:05:28,569 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-11 19:05:30,965 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8400, loss[loss=0.1183, beats_loss=0.009796, ecapa_loss=0.0001894, whisper_loss=0.1066, over 22885.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01122, ecapa_loss=0.0001916, whisper_loss=0.0928, over 3912931.69 frames. ], batch size: 88, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:05:37,456 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 19:05:41,603 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.969e-01 2024-08-11 19:05:44,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1243480.0, ans=0.125 2024-08-11 19:05:50,989 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 19:05:53,635 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 19:06:06,285 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 33 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 19:06:08,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1243580.0, ans=0.07 2024-08-11 19:06:10,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=1243680.0, ans=0.1 2024-08-11 19:06:13,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1243680.0, ans=0.0 2024-08-11 19:06:22,606 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 28 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 19:06:22,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1243780.0, ans=0.0 2024-08-11 19:06:28,019 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 19:06:36,992 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8450, loss[loss=0.1055, beats_loss=0.01074, ecapa_loss=0.0001903, whisper_loss=0.0929, over 23553.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01116, ecapa_loss=0.0001913, whisper_loss=0.09378, over 3941926.68 frames. ], batch size: 92, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:06:39,622 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-11 19:06:42,110 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 19:07:00,438 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.505e+01 2.848e+01 3.231e+01 4.188e+01, threshold=5.696e+01, percent-clipped=0.0 2024-08-11 19:07:12,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1244080.0, ans=0.125 2024-08-11 19:07:28,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1244280.0, ans=0.5 2024-08-11 19:07:42,628 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8500, loss[loss=0.1082, beats_loss=0.01168, ecapa_loss=0.0001871, whisper_loss=0.09469, over 22526.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01115, ecapa_loss=0.0001906, whisper_loss=0.09372, over 3960679.56 frames. ], batch size: 92, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:08:25,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1244680.0, ans=0.2 2024-08-11 19:08:36,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1244780.0, ans=0.1 2024-08-11 19:08:46,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1244780.0, ans=0.125 2024-08-11 19:08:49,034 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8550, loss[loss=0.07918, beats_loss=0.01518, ecapa_loss=0.0001792, whisper_loss=0.06221, over 21884.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01121, ecapa_loss=0.0001898, whisper_loss=0.09345, over 3959358.58 frames. ], batch size: 90, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:08:52,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1244880.0, ans=0.0 2024-08-11 19:09:13,166 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.056e+01 2.649e+01 3.008e+01 3.594e+01 2.630e+02, threshold=6.016e+01, percent-clipped=2.0 2024-08-11 19:09:15,070 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.47 vs. limit=15.0 2024-08-11 19:09:17,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1245080.0, ans=0.1 2024-08-11 19:09:22,607 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2024-08-11 19:09:31,398 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.25 vs. limit=15.0 2024-08-11 19:09:32,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=1245180.0, ans=15.0 2024-08-11 19:09:43,005 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 24 from LS+wenet, 32 from Vox, 40 fro AS 2024-08-11 19:09:45,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1245280.0, ans=0.2 2024-08-11 19:09:46,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1245280.0, ans=0.125 2024-08-11 19:09:48,047 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 22 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-11 19:09:54,576 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8600, loss[loss=0.1002, beats_loss=0.01231, ecapa_loss=0.000129, whisper_loss=0.08657, over 18454.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01112, ecapa_loss=0.0001912, whisper_loss=0.09381, over 3947444.20 frames. ], batch size: 68, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:09:56,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1245380.0, ans=0.2 2024-08-11 19:09:57,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1245380.0, ans=0.2 2024-08-11 19:10:07,253 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2024-08-11 19:10:08,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1245480.0, ans=0.2 2024-08-11 19:10:13,270 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 19:10:17,067 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 19:10:18,445 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 19:10:21,147 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 10 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 19:10:22,756 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=3.151e-02 2024-08-11 19:10:30,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1245580.0, ans=0.0 2024-08-11 19:10:58,919 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.99 vs. limit=22.5 2024-08-11 19:11:01,871 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8650, loss[loss=0.1077, beats_loss=0.01159, ecapa_loss=0.0001905, whisper_loss=0.09417, over 20970.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01112, ecapa_loss=0.0001907, whisper_loss=0.09332, over 3917109.58 frames. ], batch size: 88, lr: 6.98e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:11:10,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1245880.0, ans=0.125 2024-08-11 19:11:17,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1245980.0, ans=0.125 2024-08-11 19:11:18,138 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2024-08-11 19:11:20,026 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 19:11:26,501 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.702e+01 2.920e+01 3.348e+01 5.833e+01, threshold=5.840e+01, percent-clipped=0.0 2024-08-11 19:11:29,660 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 19:11:32,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1246080.0, ans=0.125 2024-08-11 19:11:34,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1246080.0, ans=0.0 2024-08-11 19:11:47,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1246180.0, ans=0.125 2024-08-11 19:11:58,289 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-11 19:12:05,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1246280.0, ans=0.0 2024-08-11 19:12:06,282 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.20 vs. limit=15.0 2024-08-11 19:12:09,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1246280.0, ans=0.125 2024-08-11 19:12:12,966 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8700, loss[loss=0.08065, beats_loss=0.01282, ecapa_loss=0.0001361, whisper_loss=0.06647, over 19996.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01124, ecapa_loss=0.0001901, whisper_loss=0.09208, over 3896761.14 frames. ], batch size: 78, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:12:18,323 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 17 from LS+wenet, 26 from Vox, 51 fro AS 2024-08-11 19:12:18,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1246380.0, ans=0.0 2024-08-11 19:12:26,241 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.80 vs. limit=22.5 2024-08-11 19:12:27,286 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2024-08-11 19:12:31,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1246480.0, ans=0.2 2024-08-11 19:12:53,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1246580.0, ans=0.125 2024-08-11 19:13:04,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1246680.0, ans=0.2 2024-08-11 19:13:14,430 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2024-08-11 19:13:31,531 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8750, loss[loss=0.08817, beats_loss=0.01208, ecapa_loss=0.0002125, whisper_loss=0.07396, over 17203.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0113, ecapa_loss=0.0001897, whisper_loss=0.09173, over 3868336.87 frames. ], batch size: 69, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:13:35,611 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.35 vs. limit=22.5 2024-08-11 19:13:36,696 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 19:13:42,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1246880.0, ans=0.125 2024-08-11 19:13:57,860 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.47 vs. limit=22.5 2024-08-11 19:14:02,194 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.729e+01 3.149e+01 3.725e+01 7.299e+01, threshold=6.297e+01, percent-clipped=2.0 2024-08-11 19:14:03,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1246980.0, ans=0.125 2024-08-11 19:14:14,532 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-11 19:14:29,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1247180.0, ans=0.1 2024-08-11 19:14:32,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1247180.0, ans=0.125 2024-08-11 19:14:47,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1247280.0, ans=0.0 2024-08-11 19:14:56,587 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8800, loss[loss=0.1088, beats_loss=0.01059, ecapa_loss=0.0001638, whisper_loss=0.09654, over 17916.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01124, ecapa_loss=0.0001897, whisper_loss=0.09232, over 3853001.89 frames. ], batch size: 66, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:15:16,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1247480.0, ans=15.0 2024-08-11 19:15:18,470 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 19:15:20,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1247480.0, ans=0.125 2024-08-11 19:15:24,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1247480.0, ans=0.125 2024-08-11 19:15:38,193 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-11 19:15:54,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1247680.0, ans=0.125 2024-08-11 19:16:12,131 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 18 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-11 19:16:18,052 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=15.0 2024-08-11 19:16:21,632 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8850, loss[loss=0.1175, beats_loss=0.009676, ecapa_loss=0.0001827, whisper_loss=0.106, over 21265.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01136, ecapa_loss=0.0001888, whisper_loss=0.09163, over 3857071.23 frames. ], batch size: 82, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:16:34,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1247880.0, ans=0.2 2024-08-11 19:16:35,807 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 19:16:52,709 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.673e+01 2.972e+01 3.544e+01 5.278e+01, threshold=5.944e+01, percent-clipped=0.0 2024-08-11 19:17:07,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1248080.0, ans=0.0 2024-08-11 19:17:34,686 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 19:17:40,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1248280.0, ans=0.2 2024-08-11 19:17:40,821 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.95 vs. limit=15.0 2024-08-11 19:17:47,795 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8900, loss[loss=0.04837, beats_loss=0.01498, ecapa_loss=0.0001767, whisper_loss=0.03162, over 12947.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01135, ecapa_loss=0.00019, whisper_loss=0.09147, over 3812101.83 frames. ], batch size: 53, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:18:08,654 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.33 vs. limit=22.5 2024-08-11 19:18:10,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1248480.0, ans=0.125 2024-08-11 19:18:18,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1248480.0, ans=15.0 2024-08-11 19:18:21,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1248580.0, ans=0.125 2024-08-11 19:18:26,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1248580.0, ans=0.125 2024-08-11 19:18:33,291 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 19:18:44,695 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 19:19:06,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1248780.0, ans=0.0 2024-08-11 19:19:12,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1248780.0, ans=0.95 2024-08-11 19:19:14,944 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 8950, loss[loss=0.1172, beats_loss=0.01132, ecapa_loss=0.000167, whisper_loss=0.1042, over 24073.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01139, ecapa_loss=0.000189, whisper_loss=0.09188, over 3835668.62 frames. ], batch size: 92, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:19:23,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1248880.0, ans=0.2 2024-08-11 19:19:25,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1248880.0, ans=0.125 2024-08-11 19:19:33,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1248980.0, ans=0.125 2024-08-11 19:19:40,363 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 19:19:44,624 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.588e+01 3.053e+01 3.414e+01 5.392e+01, threshold=6.106e+01, percent-clipped=0.0 2024-08-11 19:19:47,308 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.486e-01 2024-08-11 19:19:53,600 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 19:19:59,340 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 17 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-11 19:19:59,844 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2024-08-11 19:20:14,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1249180.0, ans=0.125 2024-08-11 19:20:18,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1249180.0, ans=0.125 2024-08-11 19:20:32,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1249280.0, ans=0.0 2024-08-11 19:20:38,772 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9000, loss[loss=0.1203, beats_loss=0.01109, ecapa_loss=0.0002102, whisper_loss=0.1071, over 22140.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01134, ecapa_loss=0.0001899, whisper_loss=0.09229, over 3852759.53 frames. ], batch size: 91, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:20:38,773 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 19:21:20,472 INFO [train_multi_KD3.py:1149] (2/4) Epoch 9, validation on ASR_libri: loss=0.2565, beats_loss=0, ecapa_loss=0.0006239, whisper_loss=0.2503, over 922467.00 frames. 2024-08-11 19:21:39,240 INFO [train_multi_KD3.py:1149] (2/4) Epoch 9, validation on SV_voxceleb1: loss=0.005312, beats_loss=0, ecapa_loss=0.0005312, whisper_loss=0, over 939242.00 frames. 2024-08-11 19:23:36,289 INFO [train_multi_KD3.py:1149] (2/4) Epoch 9, validation on AT_audioset: loss=0.02491, beats_loss=0.02491, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 19:23:36,293 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 19:23:37,763 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 19:24:18,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1249580.0, ans=0.125 2024-08-11 19:24:22,487 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 27 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-11 19:24:56,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1249780.0, ans=0.125 2024-08-11 19:25:00,858 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9050, loss[loss=0.1071, beats_loss=0.008706, ecapa_loss=0.0002783, whisper_loss=0.09562, over 19822.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01128, ecapa_loss=0.000191, whisper_loss=0.0923, over 3840668.17 frames. ], batch size: 81, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:25:04,823 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 19:25:25,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1249980.0, ans=0.0 2024-08-11 19:25:32,605 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.548e+01 2.793e+01 3.280e+01 4.630e+01, threshold=5.586e+01, percent-clipped=0.0 2024-08-11 19:25:57,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1250180.0, ans=0.125 2024-08-11 19:26:02,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1250180.0, ans=0.2 2024-08-11 19:26:15,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1250280.0, ans=0.2 2024-08-11 19:26:16,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1250280.0, ans=0.125 2024-08-11 19:26:17,348 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-11 19:26:26,930 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9100, loss[loss=0.1037, beats_loss=0.01081, ecapa_loss=0.0002027, whisper_loss=0.09083, over 17318.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01132, ecapa_loss=0.000192, whisper_loss=0.09224, over 3858292.23 frames. ], batch size: 70, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:26:27,632 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 19:26:33,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1250380.0, ans=0.05 2024-08-11 19:26:34,818 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 19:26:43,113 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-11 19:26:43,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1250380.0, ans=0.0 2024-08-11 19:26:46,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1250480.0, ans=0.125 2024-08-11 19:26:53,364 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 36 from Vox, 32 fro AS 2024-08-11 19:27:01,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1250580.0, ans=0.125 2024-08-11 19:27:17,204 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2024-08-11 19:27:20,458 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.59 vs. limit=10.0 2024-08-11 19:27:27,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1250680.0, ans=0.125 2024-08-11 19:27:52,941 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9150, loss[loss=0.1228, beats_loss=0.01022, ecapa_loss=0.000216, whisper_loss=0.1105, over 22392.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01131, ecapa_loss=0.000191, whisper_loss=0.0926, over 3876719.86 frames. ], batch size: 90, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:28:03,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1250880.0, ans=0.125 2024-08-11 19:28:11,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1250980.0, ans=0.125 2024-08-11 19:28:16,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1250980.0, ans=0.125 2024-08-11 19:28:18,091 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 19:28:23,145 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.605e+01 2.841e+01 3.221e+01 5.369e+01, threshold=5.683e+01, percent-clipped=0.0 2024-08-11 19:28:26,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1251080.0, ans=0.125 2024-08-11 19:28:46,460 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.67 vs. limit=10.0 2024-08-11 19:28:55,217 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.18 vs. limit=22.5 2024-08-11 19:28:59,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1251280.0, ans=0.125 2024-08-11 19:29:00,550 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 19:29:02,309 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.030e+02 2024-08-11 19:29:02,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1251280.0, ans=0.0 2024-08-11 19:29:12,744 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9200, loss[loss=0.1302, beats_loss=0.00921, ecapa_loss=0.0001555, whisper_loss=0.1195, over 19614.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01127, ecapa_loss=0.0001918, whisper_loss=0.09266, over 3866082.04 frames. ], batch size: 72, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:29:13,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1251380.0, ans=0.0 2024-08-11 19:29:20,614 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 19:29:58,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-11 19:30:12,003 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-11 19:30:18,383 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-11 19:30:28,602 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9250, loss[loss=0.09596, beats_loss=0.0129, ecapa_loss=0.000173, whisper_loss=0.08132, over 22593.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01123, ecapa_loss=0.0001916, whisper_loss=0.09283, over 3890516.63 frames. ], batch size: 91, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:30:30,858 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 19:30:55,119 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=9.088e-01 2024-08-11 19:30:57,038 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.240e+01 2.691e+01 2.985e+01 3.626e+01 6.428e+01, threshold=5.970e+01, percent-clipped=0.0 2024-08-11 19:31:24,265 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 19:31:41,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1252280.0, ans=0.0 2024-08-11 19:31:45,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1252380.0, ans=0.0 2024-08-11 19:31:46,374 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9300, loss[loss=0.1096, beats_loss=0.01216, ecapa_loss=0.0001512, whisper_loss=0.09592, over 23297.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0112, ecapa_loss=0.0001921, whisper_loss=0.09309, over 3915948.86 frames. ], batch size: 91, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:31:58,219 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 19:32:12,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1252480.0, ans=0.1 2024-08-11 19:32:24,416 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.13 vs. limit=22.5 2024-08-11 19:32:25,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1252580.0, ans=0.2 2024-08-11 19:32:25,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1252580.0, ans=0.125 2024-08-11 19:32:28,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1252580.0, ans=0.1 2024-08-11 19:32:43,576 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 19:32:51,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1252780.0, ans=0.125 2024-08-11 19:32:53,214 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 19:33:04,229 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-11 19:33:05,246 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9350, loss[loss=0.1004, beats_loss=0.0131, ecapa_loss=0.0001307, whisper_loss=0.08604, over 18385.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01129, ecapa_loss=0.0001919, whisper_loss=0.09228, over 3884628.39 frames. ], batch size: 68, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:33:22,452 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 19:33:24,065 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 17 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 19:33:33,109 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.65 vs. limit=10.0 2024-08-11 19:33:35,101 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.606e+01 3.008e+01 3.444e+01 5.189e+01, threshold=6.015e+01, percent-clipped=1.0 2024-08-11 19:33:35,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1253080.0, ans=0.2 2024-08-11 19:33:38,357 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-11 19:33:44,590 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=15.0 2024-08-11 19:33:50,405 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 8 from Vox, 32 fro AS 2024-08-11 19:33:57,693 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 19:34:08,883 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-11 19:34:12,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1253280.0, ans=0.0 2024-08-11 19:34:19,416 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 19:34:21,989 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-11 19:34:22,641 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9400, loss[loss=0.1046, beats_loss=0.01095, ecapa_loss=0.000197, whisper_loss=0.0917, over 20775.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01133, ecapa_loss=0.0001914, whisper_loss=0.0918, over 3872557.65 frames. ], batch size: 83, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:35:22,480 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.68 vs. limit=15.0 2024-08-11 19:35:30,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.98 vs. limit=15.0 2024-08-11 19:35:33,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1253780.0, ans=0.1 2024-08-11 19:35:37,017 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9450, loss[loss=0.09146, beats_loss=0.01334, ecapa_loss=0.00021, whisper_loss=0.07602, over 19384.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01135, ecapa_loss=0.0001916, whisper_loss=0.09168, over 3863539.02 frames. ], batch size: 81, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:35:38,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1253880.0, ans=0.2 2024-08-11 19:35:42,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1253880.0, ans=0.125 2024-08-11 19:35:45,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1253880.0, ans=0.125 2024-08-11 19:35:55,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1253980.0, ans=0.125 2024-08-11 19:35:57,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1253980.0, ans=0.125 2024-08-11 19:35:59,504 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 39 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-11 19:36:01,808 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.693e+01 3.099e+01 3.778e+01 6.565e+01, threshold=6.199e+01, percent-clipped=1.0 2024-08-11 19:36:01,969 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 8 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 19:36:24,205 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 19:36:25,225 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 10 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 19:36:43,660 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9500, loss[loss=0.131, beats_loss=0.009828, ecapa_loss=0.0001949, whisper_loss=0.1192, over 20536.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01137, ecapa_loss=0.0001912, whisper_loss=0.09082, over 3839086.07 frames. ], batch size: 79, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:37:00,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1254480.0, ans=0.125 2024-08-11 19:37:13,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1254580.0, ans=0.0 2024-08-11 19:37:29,717 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 19:37:32,336 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 19:37:33,746 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-11 19:37:36,471 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-11 19:37:49,046 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9550, loss[loss=0.09382, beats_loss=0.01334, ecapa_loss=0.0001828, whisper_loss=0.07865, over 12807.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01139, ecapa_loss=0.0001917, whisper_loss=0.0907, over 3847536.79 frames. ], batch size: 53, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:37:55,570 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 32 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 19:37:56,931 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 19:38:08,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1254980.0, ans=0.125 2024-08-11 19:38:11,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1254980.0, ans=0.04949747468305833 2024-08-11 19:38:13,914 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.508e+01 2.726e+01 3.017e+01 8.338e+01, threshold=5.453e+01, percent-clipped=1.0 2024-08-11 19:38:18,828 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2024-08-11 19:38:20,853 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 19:38:37,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1255180.0, ans=0.125 2024-08-11 19:38:41,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1255280.0, ans=10.0 2024-08-11 19:38:41,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1255280.0, ans=0.2 2024-08-11 19:38:47,085 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 19:38:52,212 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 19:38:54,653 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9600, loss[loss=0.1047, beats_loss=0.01297, ecapa_loss=0.000177, whisper_loss=0.08999, over 22669.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01139, ecapa_loss=0.0001921, whisper_loss=0.09083, over 3849457.26 frames. ], batch size: 92, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:38:55,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1255380.0, ans=0.2 2024-08-11 19:38:56,097 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-11 19:38:58,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1255380.0, ans=0.125 2024-08-11 19:39:00,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1255380.0, ans=0.0 2024-08-11 19:39:00,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=15.0 2024-08-11 19:39:03,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1255380.0, ans=0.0 2024-08-11 19:39:36,887 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 19:39:38,310 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 30 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-11 19:39:38,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1255680.0, ans=0.125 2024-08-11 19:39:49,367 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 19:40:01,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1255880.0, ans=0.125 2024-08-11 19:40:02,015 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9650, loss[loss=0.1043, beats_loss=0.007373, ecapa_loss=0.0002524, whisper_loss=0.09444, over 16742.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01127, ecapa_loss=0.0001938, whisper_loss=0.09108, over 3824092.54 frames. ], batch size: 63, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:40:27,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1255980.0, ans=0.125 2024-08-11 19:40:27,789 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.826e+01 3.085e+01 3.592e+01 1.036e+02, threshold=6.169e+01, percent-clipped=1.0 2024-08-11 19:40:36,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1256080.0, ans=0.125 2024-08-11 19:40:58,695 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-11 19:41:00,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1256280.0, ans=0.125 2024-08-11 19:41:09,002 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9700, loss[loss=0.1232, beats_loss=0.01066, ecapa_loss=0.0001846, whisper_loss=0.1107, over 22405.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0112, ecapa_loss=0.0001944, whisper_loss=0.0919, over 3836070.24 frames. ], batch size: 89, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:41:12,926 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 26 from LS+wenet, 8 from Vox, 27 fro AS 2024-08-11 19:41:13,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1256380.0, ans=22.5 2024-08-11 19:41:16,797 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 19:41:22,168 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 28 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-11 19:41:25,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1256480.0, ans=0.2 2024-08-11 19:41:29,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1256480.0, ans=0.025 2024-08-11 19:41:29,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.72 vs. limit=15.0 2024-08-11 19:41:34,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1256580.0, ans=0.0 2024-08-11 19:41:39,442 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 19:41:44,355 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 19:41:48,532 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 24 from LS+wenet, 11 from Vox, 19 fro AS 2024-08-11 19:41:49,493 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.04 vs. limit=15.0 2024-08-11 19:42:06,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1256780.0, ans=0.125 2024-08-11 19:42:08,690 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-11 19:42:11,240 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 19:42:14,846 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9750, loss[loss=0.1059, beats_loss=0.01203, ecapa_loss=0.0001605, whisper_loss=0.09227, over 23099.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01115, ecapa_loss=0.0001933, whisper_loss=0.09197, over 3811485.44 frames. ], batch size: 90, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:42:23,288 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 19:42:24,675 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 19:42:26,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1256880.0, ans=0.125 2024-08-11 19:42:35,282 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 13 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 19:42:35,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1256980.0, ans=0.1 2024-08-11 19:42:39,176 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 19:42:39,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1256980.0, ans=0.1 2024-08-11 19:42:40,365 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.576e+01 2.817e+01 3.279e+01 5.572e+01, threshold=5.633e+01, percent-clipped=0.0 2024-08-11 19:42:43,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1257080.0, ans=0.0 2024-08-11 19:42:51,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1257080.0, ans=0.125 2024-08-11 19:43:01,626 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.75 vs. limit=22.5 2024-08-11 19:43:13,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1257280.0, ans=0.05 2024-08-11 19:43:14,535 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 19:43:19,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1257280.0, ans=0.1 2024-08-11 19:43:21,178 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9800, loss[loss=0.1141, beats_loss=0.01219, ecapa_loss=0.0001803, whisper_loss=0.1001, over 21983.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01124, ecapa_loss=0.0001934, whisper_loss=0.09125, over 3801503.94 frames. ], batch size: 88, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:43:26,568 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 33 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-11 19:43:31,261 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.24 vs. limit=15.0 2024-08-11 19:43:32,411 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=22.5 2024-08-11 19:43:52,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1257580.0, ans=0.125 2024-08-11 19:43:53,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1257580.0, ans=0.1 2024-08-11 19:43:58,844 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-11 19:44:09,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1257680.0, ans=0.0 2024-08-11 19:44:26,439 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9850, loss[loss=0.1019, beats_loss=0.01121, ecapa_loss=0.0002054, whisper_loss=0.08868, over 21263.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01137, ecapa_loss=0.0001919, whisper_loss=0.09131, over 3837477.35 frames. ], batch size: 87, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:44:32,939 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 19:44:43,764 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 19:44:45,185 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 19:44:45,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1257980.0, ans=0.1 2024-08-11 19:44:51,541 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.685e+01 3.037e+01 3.617e+01 4.839e+01, threshold=6.074e+01, percent-clipped=0.0 2024-08-11 19:44:55,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1258080.0, ans=0.125 2024-08-11 19:45:05,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1258180.0, ans=0.025 2024-08-11 19:45:08,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1258180.0, ans=0.125 2024-08-11 19:45:24,224 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 19:45:31,863 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9900, loss[loss=0.0959, beats_loss=0.01382, ecapa_loss=0.0001447, whisper_loss=0.08064, over 18479.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01146, ecapa_loss=0.0001901, whisper_loss=0.09117, over 3855787.68 frames. ], batch size: 72, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:45:53,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1258480.0, ans=0.125 2024-08-11 19:46:07,868 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2024-08-11 19:46:12,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1258680.0, ans=0.125 2024-08-11 19:46:15,349 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2024-08-11 19:46:17,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1258680.0, ans=0.125 2024-08-11 19:46:23,486 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 19:46:31,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1258780.0, ans=0.0 2024-08-11 19:46:36,490 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 9950, loss[loss=0.1203, beats_loss=0.009273, ecapa_loss=0.0001774, whisper_loss=0.1092, over 15478.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01151, ecapa_loss=0.0001899, whisper_loss=0.09122, over 3852629.05 frames. ], batch size: 58, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:46:38,037 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-11 19:46:45,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1258880.0, ans=0.125 2024-08-11 19:46:52,106 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-11 19:46:58,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1258980.0, ans=0.125 2024-08-11 19:47:01,992 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.542e+01 2.819e+01 3.280e+01 8.897e+01, threshold=5.637e+01, percent-clipped=1.0 2024-08-11 19:47:10,152 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-11 19:47:10,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1259080.0, ans=0.05 2024-08-11 19:47:23,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1259180.0, ans=0.2 2024-08-11 19:47:39,993 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 27 from LS+wenet, 21 from Vox, 16 fro AS 2024-08-11 19:47:40,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1259280.0, ans=0.1 2024-08-11 19:47:40,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1259280.0, ans=0.0 2024-08-11 19:47:40,845 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.62 vs. limit=15.0 2024-08-11 19:47:41,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1259380.0, ans=0.1 2024-08-11 19:47:42,653 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10000, loss[loss=0.08864, beats_loss=0.01044, ecapa_loss=0.0002054, whisper_loss=0.07614, over 22140.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01144, ecapa_loss=0.0001911, whisper_loss=0.09171, over 3843853.84 frames. ], batch size: 91, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:47:57,347 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 19:47:58,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1259480.0, ans=0.1 2024-08-11 19:48:11,823 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-11 19:48:14,197 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 19:48:36,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1259780.0, ans=0.125 2024-08-11 19:48:40,134 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 19:48:47,964 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10050, loss[loss=0.08826, beats_loss=0.01278, ecapa_loss=0.0002018, whisper_loss=0.07346, over 22603.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01138, ecapa_loss=0.0001916, whisper_loss=0.09224, over 3874615.04 frames. ], batch size: 94, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:48:50,512 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 19:48:52,973 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 36 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 19:48:59,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1259980.0, ans=0.0 2024-08-11 19:49:01,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1259980.0, ans=0.125 2024-08-11 19:49:04,958 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 19:49:12,420 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.712e+01 3.023e+01 3.510e+01 5.543e+01, threshold=6.045e+01, percent-clipped=0.0 2024-08-11 19:49:15,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1260080.0, ans=0.125 2024-08-11 19:49:26,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1260180.0, ans=0.1 2024-08-11 19:49:30,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1260180.0, ans=0.125 2024-08-11 19:49:33,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1260180.0, ans=0.1 2024-08-11 19:49:34,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1260180.0, ans=0.125 2024-08-11 19:49:40,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1260280.0, ans=0.05 2024-08-11 19:49:52,938 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10100, loss[loss=0.1105, beats_loss=0.01303, ecapa_loss=0.000177, whisper_loss=0.09569, over 22270.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01136, ecapa_loss=0.0001918, whisper_loss=0.09208, over 3909019.10 frames. ], batch size: 90, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:49:59,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1260380.0, ans=0.125 2024-08-11 19:50:19,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1260580.0, ans=0.0 2024-08-11 19:50:27,078 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 19:50:28,346 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 19:50:49,430 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 19:50:54,287 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 19:50:56,096 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2024-08-11 19:50:58,148 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10150, loss[loss=0.1074, beats_loss=0.01163, ecapa_loss=0.0002025, whisper_loss=0.0937, over 21055.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01129, ecapa_loss=0.0001939, whisper_loss=0.0925, over 3958638.55 frames. ], batch size: 82, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:50:59,584 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 28 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 19:51:12,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1260980.0, ans=0.05 2024-08-11 19:51:23,121 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.653e+01 2.999e+01 3.558e+01 5.617e+01, threshold=5.997e+01, percent-clipped=0.0 2024-08-11 19:51:33,671 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 19:52:03,881 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10200, loss[loss=0.08539, beats_loss=0.01108, ecapa_loss=0.0002231, whisper_loss=0.07208, over 17143.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01126, ecapa_loss=0.0001939, whisper_loss=0.09259, over 3937573.77 frames. ], batch size: 72, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:52:25,093 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 19:52:31,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1261580.0, ans=0.0 2024-08-11 19:52:41,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1261580.0, ans=0.125 2024-08-11 19:52:47,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1261680.0, ans=0.125 2024-08-11 19:52:49,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1261680.0, ans=0.125 2024-08-11 19:52:49,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1261680.0, ans=0.125 2024-08-11 19:52:53,833 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 19:52:56,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1261780.0, ans=0.125 2024-08-11 19:52:59,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1261780.0, ans=0.1 2024-08-11 19:53:00,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1261780.0, ans=0.0 2024-08-11 19:53:09,013 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10250, loss[loss=0.1098, beats_loss=0.009328, ecapa_loss=0.0002238, whisper_loss=0.09828, over 19935.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01117, ecapa_loss=0.0001929, whisper_loss=0.09338, over 3954263.95 frames. ], batch size: 84, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:53:19,555 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 19 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 19:53:25,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1261980.0, ans=0.0 2024-08-11 19:53:33,859 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.624e+01 2.927e+01 3.242e+01 1.065e+02, threshold=5.855e+01, percent-clipped=3.0 2024-08-11 19:53:34,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1262080.0, ans=0.0 2024-08-11 19:53:38,260 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 19:53:43,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1262080.0, ans=0.125 2024-08-11 19:53:44,759 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 19:53:47,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1262180.0, ans=0.2 2024-08-11 19:54:14,676 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2024-08-11 19:54:15,333 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10300, loss[loss=0.1271, beats_loss=0.009341, ecapa_loss=0.000212, whisper_loss=0.1156, over 22901.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01114, ecapa_loss=0.0001929, whisper_loss=0.09395, over 3931139.43 frames. ], batch size: 90, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:54:15,464 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 19:54:21,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1262380.0, ans=0.125 2024-08-11 19:54:26,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1262380.0, ans=0.09899494936611666 2024-08-11 19:54:29,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1262480.0, ans=0.0 2024-08-11 19:54:41,462 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.29 vs. limit=10.0 2024-08-11 19:55:08,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1262780.0, ans=0.2 2024-08-11 19:55:20,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1262880.0, ans=0.0 2024-08-11 19:55:20,872 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10350, loss[loss=0.09453, beats_loss=0.01159, ecapa_loss=0.0002, whisper_loss=0.08094, over 19521.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.0112, ecapa_loss=0.0001927, whisper_loss=0.09369, over 3938147.83 frames. ], batch size: 81, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:55:27,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1262880.0, ans=0.0 2024-08-11 19:55:38,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1262980.0, ans=0.125 2024-08-11 19:55:45,853 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.025e+01 2.735e+01 3.032e+01 3.459e+01 9.732e+01, threshold=6.064e+01, percent-clipped=1.0 2024-08-11 19:56:16,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1263280.0, ans=0.125 2024-08-11 19:56:20,259 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 19:56:26,520 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10400, loss[loss=0.1246, beats_loss=0.01037, ecapa_loss=0.0002476, whisper_loss=0.1117, over 21252.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01121, ecapa_loss=0.0001915, whisper_loss=0.09304, over 3897738.24 frames. ], batch size: 88, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:56:30,533 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 19:56:37,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1263380.0, ans=0.125 2024-08-11 19:56:50,217 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 19:56:55,234 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-11 19:56:58,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1263580.0, ans=0.0 2024-08-11 19:57:10,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1263680.0, ans=0.0 2024-08-11 19:57:16,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1263680.0, ans=0.0 2024-08-11 19:57:31,790 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10450, loss[loss=0.08686, beats_loss=0.01257, ecapa_loss=0.0002114, whisper_loss=0.07217, over 21453.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01112, ecapa_loss=0.0001929, whisper_loss=0.09329, over 3897533.21 frames. ], batch size: 92, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:57:32,014 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 19:57:36,130 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 40 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 19:57:46,267 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.47 vs. limit=15.0 2024-08-11 19:57:50,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1263980.0, ans=0.0 2024-08-11 19:57:56,619 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.575e+01 2.883e+01 3.290e+01 7.177e+01, threshold=5.767e+01, percent-clipped=1.0 2024-08-11 19:58:08,518 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-11 19:58:10,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1264180.0, ans=0.07 2024-08-11 19:58:10,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1264180.0, ans=0.125 2024-08-11 19:58:14,929 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 19:58:16,979 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=12.0 2024-08-11 19:58:23,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1264280.0, ans=0.0 2024-08-11 19:58:29,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1264280.0, ans=0.1 2024-08-11 19:58:35,791 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 19:58:36,969 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10500, loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001881, whisper_loss=0.0898, over 17407.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01109, ecapa_loss=0.0001928, whisper_loss=0.09411, over 3930529.03 frames. ], batch size: 71, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:58:47,948 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-11 19:59:03,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1264580.0, ans=0.05 2024-08-11 19:59:20,993 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 19:59:24,460 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.48 vs. limit=15.0 2024-08-11 19:59:43,874 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10550, loss[loss=0.1001, beats_loss=0.0118, ecapa_loss=0.0001893, whisper_loss=0.08644, over 23028.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01113, ecapa_loss=0.0001938, whisper_loss=0.09345, over 3914543.39 frames. ], batch size: 90, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:59:45,793 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.80 vs. limit=22.5 2024-08-11 19:59:46,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1264880.0, ans=0.0 2024-08-11 19:59:49,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1264880.0, ans=0.125 2024-08-11 20:00:08,116 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.610e+01 2.840e+01 3.443e+01 6.303e+01, threshold=5.679e+01, percent-clipped=1.0 2024-08-11 20:00:15,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1265080.0, ans=0.05 2024-08-11 20:00:16,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1265080.0, ans=0.1 2024-08-11 20:00:21,315 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=15.0 2024-08-11 20:00:31,279 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 20:00:44,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1265280.0, ans=0.0 2024-08-11 20:00:48,280 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-11 20:00:49,446 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10600, loss[loss=0.09705, beats_loss=0.009089, ecapa_loss=0.0002133, whisper_loss=0.08583, over 13739.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01115, ecapa_loss=0.0001942, whisper_loss=0.09327, over 3920443.61 frames. ], batch size: 54, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:00:53,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1265380.0, ans=0.125 2024-08-11 20:01:00,905 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 20:01:04,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1265480.0, ans=0.0 2024-08-11 20:01:07,244 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.41 vs. limit=15.0 2024-08-11 20:01:12,155 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.88 vs. limit=12.0 2024-08-11 20:01:29,418 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2024-08-11 20:01:40,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1265680.0, ans=0.2 2024-08-11 20:01:55,651 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10650, loss[loss=0.1094, beats_loss=0.009716, ecapa_loss=0.0002223, whisper_loss=0.09746, over 20323.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01108, ecapa_loss=0.0001936, whisper_loss=0.09349, over 3909707.39 frames. ], batch size: 82, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:02:08,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1265980.0, ans=0.125 2024-08-11 20:02:17,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1265980.0, ans=0.2 2024-08-11 20:02:21,103 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.841e+01 3.157e+01 3.812e+01 6.518e+01, threshold=6.314e+01, percent-clipped=4.0 2024-08-11 20:02:28,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1266080.0, ans=0.0 2024-08-11 20:02:29,748 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.690e-03 2024-08-11 20:02:51,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1266280.0, ans=0.2 2024-08-11 20:03:02,762 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10700, loss[loss=0.09035, beats_loss=0.01309, ecapa_loss=0.0001745, whisper_loss=0.07551, over 17832.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01112, ecapa_loss=0.0001917, whisper_loss=0.09311, over 3888077.49 frames. ], batch size: 68, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:03:12,431 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 20:03:25,801 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 20:03:33,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1266580.0, ans=0.125 2024-08-11 20:03:35,302 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 20:04:09,652 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10750, loss[loss=0.1291, beats_loss=0.00783, ecapa_loss=0.0002037, whisper_loss=0.1192, over 16436.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01113, ecapa_loss=0.0001911, whisper_loss=0.09371, over 3901842.85 frames. ], batch size: 64, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:04:12,298 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 20:04:14,787 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 25 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-11 20:04:15,215 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=8.729e+00 2024-08-11 20:04:26,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1266980.0, ans=0.0 2024-08-11 20:04:36,020 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.628e+01 2.928e+01 3.321e+01 7.388e+01, threshold=5.856e+01, percent-clipped=1.0 2024-08-11 20:04:41,622 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.841e+01 2024-08-11 20:04:56,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1267180.0, ans=0.0 2024-08-11 20:04:56,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1267180.0, ans=0.125 2024-08-11 20:04:56,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1267180.0, ans=0.1 2024-08-11 20:05:02,314 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 20:05:05,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1267280.0, ans=0.2 2024-08-11 20:05:13,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1267280.0, ans=0.125 2024-08-11 20:05:15,947 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 20:05:16,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1267280.0, ans=0.125 2024-08-11 20:05:19,880 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10800, loss[loss=0.08394, beats_loss=0.01182, ecapa_loss=0.0001952, whisper_loss=0.07017, over 16155.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01105, ecapa_loss=0.0001909, whisper_loss=0.09455, over 3919580.32 frames. ], batch size: 66, lr: 6.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:05:24,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1267380.0, ans=0.125 2024-08-11 20:05:29,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1267380.0, ans=0.125 2024-08-11 20:05:37,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=1267480.0, ans=0.2 2024-08-11 20:05:38,854 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.00 vs. limit=22.5 2024-08-11 20:05:55,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1267580.0, ans=0.0 2024-08-11 20:06:00,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=1267580.0, ans=0.2 2024-08-11 20:06:32,376 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 20:06:35,634 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10850, loss[loss=0.115, beats_loss=0.01072, ecapa_loss=0.0002156, whisper_loss=0.1021, over 21814.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01105, ecapa_loss=0.0001907, whisper_loss=0.0948, over 3910854.15 frames. ], batch size: 90, lr: 6.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:06:42,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1267880.0, ans=0.125 2024-08-11 20:06:48,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1267880.0, ans=0.0 2024-08-11 20:06:49,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1267980.0, ans=0.0 2024-08-11 20:06:51,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1267980.0, ans=0.1 2024-08-11 20:06:53,278 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 20:07:05,465 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.647e+01 2.915e+01 3.241e+01 5.191e+01, threshold=5.831e+01, percent-clipped=0.0 2024-08-11 20:07:29,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1268180.0, ans=0.0 2024-08-11 20:07:32,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1268180.0, ans=0.125 2024-08-11 20:07:39,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1268280.0, ans=0.5 2024-08-11 20:07:40,458 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-08-11 20:07:47,137 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.14 vs. limit=15.0 2024-08-11 20:07:53,819 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10900, loss[loss=0.0696, beats_loss=0.01523, ecapa_loss=0.0001496, whisper_loss=0.05288, over 15898.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01104, ecapa_loss=0.0001903, whisper_loss=0.09486, over 3917973.45 frames. ], batch size: 64, lr: 6.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:08:04,458 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 20:08:06,644 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 20:08:16,857 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.99 vs. limit=15.0 2024-08-11 20:08:27,302 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 20:08:45,280 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 20:08:57,279 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 20:09:06,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1268780.0, ans=0.125 2024-08-11 20:09:09,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1268780.0, ans=0.1 2024-08-11 20:09:12,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1268780.0, ans=0.1 2024-08-11 20:09:19,557 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 10950, loss[loss=0.09053, beats_loss=0.01285, ecapa_loss=0.0001802, whisper_loss=0.07588, over 18985.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01116, ecapa_loss=0.0001879, whisper_loss=0.0951, over 3920349.70 frames. ], batch size: 81, lr: 6.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:10:01,318 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.627e+01 3.007e+01 3.464e+01 1.236e+02, threshold=6.014e+01, percent-clipped=3.0 2024-08-11 20:10:36,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1269180.0, ans=0.2 2024-08-11 20:10:36,462 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.48 vs. limit=22.5 2024-08-11 20:10:38,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1269180.0, ans=0.95 2024-08-11 20:10:44,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1269280.0, ans=0.125 2024-08-11 20:10:48,989 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 20:10:53,482 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 20:11:04,075 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 20:11:08,351 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11000, loss[loss=0.1203, beats_loss=0.009372, ecapa_loss=0.0001503, whisper_loss=0.1095, over 16772.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01116, ecapa_loss=0.0001909, whisper_loss=0.09482, over 3923043.44 frames. ], batch size: 60, lr: 6.92e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:11:12,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1269380.0, ans=0.0 2024-08-11 20:11:15,236 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-11 20:11:25,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1269380.0, ans=0.1 2024-08-11 20:11:41,245 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.12 vs. limit=10.0 2024-08-11 20:11:46,255 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 20:12:04,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1269680.0, ans=0.0 2024-08-11 20:12:49,486 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11050, loss[loss=0.1206, beats_loss=0.009081, ecapa_loss=0.0001862, whisper_loss=0.1096, over 18486.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.0112, ecapa_loss=0.000191, whisper_loss=0.09415, over 3895085.30 frames. ], batch size: 71, lr: 6.92e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:13:21,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1269980.0, ans=0.09899494936611666 2024-08-11 20:13:33,747 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.512e+01 2.845e+01 3.437e+01 6.269e+01, threshold=5.689e+01, percent-clipped=1.0 2024-08-11 20:13:42,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1270080.0, ans=0.125 2024-08-11 20:13:46,656 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 20:14:46,132 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11100, loss[loss=0.1127, beats_loss=0.01114, ecapa_loss=0.0001666, whisper_loss=0.0999, over 23501.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01124, ecapa_loss=0.0001903, whisper_loss=0.09364, over 3910802.42 frames. ], batch size: 91, lr: 6.92e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:14:48,770 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 20:14:51,829 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-11 20:15:04,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1270380.0, ans=0.125 2024-08-11 20:15:06,257 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-11 20:15:20,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1270480.0, ans=0.125 2024-08-11 20:15:25,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1270480.0, ans=0.0 2024-08-11 20:15:29,562 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-11 20:15:33,312 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 20:15:39,743 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.759e+02 2024-08-11 20:16:06,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1270680.0, ans=0.0 2024-08-11 20:16:06,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1270680.0, ans=0.1 2024-08-11 20:16:14,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1270680.0, ans=0.0 2024-08-11 20:16:21,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1270680.0, ans=0.125 2024-08-11 20:16:22,072 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 20:16:29,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1270780.0, ans=0.1 2024-08-11 20:16:46,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1270880.0, ans=0.1 2024-08-11 20:16:47,621 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11150, loss[loss=0.1106, beats_loss=0.01036, ecapa_loss=0.0001696, whisper_loss=0.09857, over 21803.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01121, ecapa_loss=0.000189, whisper_loss=0.09372, over 3897788.37 frames. ], batch size: 84, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:17:05,204 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.96 vs. limit=10.0 2024-08-11 20:17:31,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1270980.0, ans=0.125 2024-08-11 20:17:36,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1270980.0, ans=0.0 2024-08-11 20:17:39,351 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.507e+01 2.811e+01 3.221e+01 4.609e+01, threshold=5.623e+01, percent-clipped=0.0 2024-08-11 20:18:24,865 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 30 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-11 20:18:30,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1271280.0, ans=0.1 2024-08-11 20:18:32,001 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-11 20:18:32,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1271280.0, ans=0.0 2024-08-11 20:18:33,734 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 20:18:36,891 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11200, loss[loss=0.09667, beats_loss=0.01443, ecapa_loss=0.0002211, whisper_loss=0.08004, over 21419.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01118, ecapa_loss=0.0001907, whisper_loss=0.09395, over 3885546.73 frames. ], batch size: 87, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:19:03,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1271480.0, ans=0.2 2024-08-11 20:19:18,883 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-08-11 20:19:46,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1271780.0, ans=0.2 2024-08-11 20:19:59,789 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 20:20:05,083 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11250, loss[loss=0.07777, beats_loss=0.01183, ecapa_loss=0.0001614, whisper_loss=0.06433, over 20388.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01121, ecapa_loss=0.0001893, whisper_loss=0.09328, over 3871864.85 frames. ], batch size: 81, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:20:11,989 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 20:20:35,530 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 17 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 20:20:38,136 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.603e+01 2.926e+01 3.414e+01 6.111e+01, threshold=5.851e+01, percent-clipped=1.0 2024-08-11 20:20:49,593 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 20:21:00,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1272180.0, ans=0.125 2024-08-11 20:21:02,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1272180.0, ans=0.2 2024-08-11 20:21:03,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1272180.0, ans=0.125 2024-08-11 20:21:34,355 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11300, loss[loss=0.1123, beats_loss=0.01202, ecapa_loss=0.0001641, whisper_loss=0.09866, over 19885.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01127, ecapa_loss=0.0001872, whisper_loss=0.09305, over 3900420.09 frames. ], batch size: 78, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:21:35,999 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.64 vs. limit=15.0 2024-08-11 20:21:37,999 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 20:21:39,944 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 20:22:10,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1272580.0, ans=0.2 2024-08-11 20:22:18,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1272580.0, ans=0.125 2024-08-11 20:22:30,106 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 16 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-11 20:22:44,902 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.40 vs. limit=22.5 2024-08-11 20:23:02,487 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.33 vs. limit=15.0 2024-08-11 20:23:04,930 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11350, loss[loss=0.1012, beats_loss=0.01141, ecapa_loss=0.0001737, whisper_loss=0.08806, over 21439.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01121, ecapa_loss=0.0001881, whisper_loss=0.09261, over 3900253.51 frames. ], batch size: 87, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:23:05,706 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 20:23:12,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1272880.0, ans=0.125 2024-08-11 20:23:14,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1272880.0, ans=0.125 2024-08-11 20:23:39,747 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.545e+01 2.892e+01 3.550e+01 1.179e+02, threshold=5.785e+01, percent-clipped=1.0 2024-08-11 20:24:11,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1273180.0, ans=0.125 2024-08-11 20:24:20,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1273280.0, ans=0.05 2024-08-11 20:24:27,489 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.83 vs. limit=22.5 2024-08-11 20:24:35,039 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11400, loss[loss=0.1027, beats_loss=0.01251, ecapa_loss=0.0001643, whisper_loss=0.08857, over 20004.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01125, ecapa_loss=0.0001883, whisper_loss=0.0924, over 3907108.06 frames. ], batch size: 81, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:24:38,103 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-11 20:24:52,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1273480.0, ans=0.0 2024-08-11 20:25:05,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1273480.0, ans=0.125 2024-08-11 20:25:07,259 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 20:25:15,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1273580.0, ans=0.125 2024-08-11 20:25:23,729 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.620e-02 2024-08-11 20:25:40,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1273680.0, ans=0.125 2024-08-11 20:25:49,938 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 20:26:03,875 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11450, loss[loss=0.07098, beats_loss=0.01502, ecapa_loss=0.0001512, whisper_loss=0.05445, over 22729.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01126, ecapa_loss=0.0001899, whisper_loss=0.09187, over 3891054.04 frames. ], batch size: 94, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:26:07,486 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 20:26:09,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1273880.0, ans=0.125 2024-08-11 20:26:14,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1273880.0, ans=0.0 2024-08-11 20:26:24,595 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2024-08-11 20:26:37,113 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 33 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 20:26:38,188 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 2.725e+01 3.153e+01 3.598e+01 9.857e+01, threshold=6.305e+01, percent-clipped=2.0 2024-08-11 20:27:00,820 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2024-08-11 20:27:09,634 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.59 vs. limit=15.0 2024-08-11 20:27:24,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1274280.0, ans=0.0 2024-08-11 20:27:32,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1274380.0, ans=0.2 2024-08-11 20:27:32,999 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11500, loss[loss=0.1235, beats_loss=0.006741, ecapa_loss=0.0002203, whisper_loss=0.1145, over 18661.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01123, ecapa_loss=0.0001906, whisper_loss=0.09194, over 3894035.67 frames. ], batch size: 71, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:27:43,428 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-11 20:27:49,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1274380.0, ans=0.125 2024-08-11 20:28:09,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1274580.0, ans=0.0 2024-08-11 20:28:11,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1274580.0, ans=0.1 2024-08-11 20:28:51,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1274780.0, ans=0.0 2024-08-11 20:29:07,105 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11550, loss[loss=0.112, beats_loss=0.01153, ecapa_loss=0.0001703, whisper_loss=0.09874, over 20381.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01107, ecapa_loss=0.0001911, whisper_loss=0.09319, over 3915518.80 frames. ], batch size: 81, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:29:19,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1274880.0, ans=0.0 2024-08-11 20:29:29,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1274980.0, ans=0.125 2024-08-11 20:29:30,118 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.73 vs. limit=22.5 2024-08-11 20:29:32,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1274980.0, ans=0.125 2024-08-11 20:29:43,054 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.690e+01 2.944e+01 3.463e+01 4.757e+01, threshold=5.888e+01, percent-clipped=0.0 2024-08-11 20:29:43,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1275080.0, ans=0.0 2024-08-11 20:29:45,564 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-11 20:29:49,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1275080.0, ans=0.125 2024-08-11 20:30:10,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1275180.0, ans=0.125 2024-08-11 20:30:15,540 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 20:30:22,050 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.31 vs. limit=10.0 2024-08-11 20:30:28,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1275280.0, ans=0.0 2024-08-11 20:30:28,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1275280.0, ans=0.0 2024-08-11 20:30:37,997 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11600, loss[loss=0.1229, beats_loss=0.009711, ecapa_loss=0.0001838, whisper_loss=0.1113, over 17164.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01107, ecapa_loss=0.0001913, whisper_loss=0.0932, over 3931200.29 frames. ], batch size: 68, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:30:53,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1275480.0, ans=0.2 2024-08-11 20:31:00,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1275480.0, ans=0.125 2024-08-11 20:31:12,998 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-11 20:31:23,238 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 18 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 20:31:30,289 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 18 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 20:31:30,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1275680.0, ans=0.125 2024-08-11 20:31:32,138 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 20:31:32,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1275680.0, ans=0.05 2024-08-11 20:31:45,267 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.72 vs. limit=6.0 2024-08-11 20:31:55,496 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-11 20:32:06,355 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11650, loss[loss=0.1094, beats_loss=0.01061, ecapa_loss=0.0002235, whisper_loss=0.09653, over 22388.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01111, ecapa_loss=0.0001903, whisper_loss=0.09264, over 3911610.41 frames. ], batch size: 92, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:32:13,644 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 31 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 20:32:14,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1275880.0, ans=0.125 2024-08-11 20:32:17,884 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 20:32:29,506 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.94 vs. limit=15.0 2024-08-11 20:32:41,747 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=22.5 2024-08-11 20:32:41,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=1275980.0, ans=15.0 2024-08-11 20:32:44,219 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.564e+01 2.809e+01 3.170e+01 4.570e+01, threshold=5.617e+01, percent-clipped=0.0 2024-08-11 20:32:54,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1276080.0, ans=0.125 2024-08-11 20:32:58,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1276080.0, ans=0.07 2024-08-11 20:32:59,790 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 20:33:02,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1276080.0, ans=0.125 2024-08-11 20:33:18,927 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 20:33:21,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1276180.0, ans=0.2 2024-08-11 20:33:23,051 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 10 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 20:33:25,165 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 20:33:25,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1276280.0, ans=0.125 2024-08-11 20:33:25,780 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-11 20:33:43,258 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11700, loss[loss=0.08528, beats_loss=0.01524, ecapa_loss=0.0001646, whisper_loss=0.06839, over 20818.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01123, ecapa_loss=0.0001896, whisper_loss=0.09197, over 3904545.79 frames. ], batch size: 83, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:34:36,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1276680.0, ans=0.0 2024-08-11 20:34:40,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1276680.0, ans=0.1 2024-08-11 20:34:56,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1276780.0, ans=0.1 2024-08-11 20:35:10,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1276780.0, ans=0.05 2024-08-11 20:35:14,972 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11750, loss[loss=0.1083, beats_loss=0.009607, ecapa_loss=0.0002149, whisper_loss=0.09654, over 21642.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01129, ecapa_loss=0.0001912, whisper_loss=0.09203, over 3917350.21 frames. ], batch size: 83, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:35:26,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1276880.0, ans=0.125 2024-08-11 20:35:33,201 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-11 20:35:49,406 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.651e+01 2.904e+01 3.391e+01 1.042e+02, threshold=5.808e+01, percent-clipped=2.0 2024-08-11 20:35:50,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1277080.0, ans=0.1 2024-08-11 20:36:03,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1277080.0, ans=0.0 2024-08-11 20:36:23,708 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 20:36:25,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1277280.0, ans=0.07 2024-08-11 20:36:27,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1277280.0, ans=0.125 2024-08-11 20:36:42,569 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 30 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 20:36:43,584 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11800, loss[loss=0.1237, beats_loss=0.009881, ecapa_loss=0.0001949, whisper_loss=0.1119, over 20835.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01134, ecapa_loss=0.0001895, whisper_loss=0.09222, over 3914096.05 frames. ], batch size: 80, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:36:44,504 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 20:36:44,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1277380.0, ans=0.0 2024-08-11 20:36:49,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1277380.0, ans=0.125 2024-08-11 20:36:52,875 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-11 20:36:55,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1277380.0, ans=0.0 2024-08-11 20:36:57,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1277380.0, ans=0.125 2024-08-11 20:37:01,592 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 20:37:05,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1277480.0, ans=0.05 2024-08-11 20:37:23,013 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.481e-01 2024-08-11 20:37:44,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1277680.0, ans=0.0 2024-08-11 20:37:56,618 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 20:37:58,399 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 20:38:10,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1277880.0, ans=0.1 2024-08-11 20:38:12,019 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11850, loss[loss=0.1147, beats_loss=0.01103, ecapa_loss=0.0001918, whisper_loss=0.1018, over 16143.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01134, ecapa_loss=0.0001888, whisper_loss=0.09259, over 3911543.94 frames. ], batch size: 66, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:38:27,760 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.565e+05 2024-08-11 20:38:43,357 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.625e+01 2.967e+01 3.340e+01 5.309e+01, threshold=5.933e+01, percent-clipped=0.0 2024-08-11 20:39:00,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1278080.0, ans=0.0 2024-08-11 20:39:20,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1278280.0, ans=0.125 2024-08-11 20:39:38,129 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11900, loss[loss=0.1122, beats_loss=0.01111, ecapa_loss=0.0001862, whisper_loss=0.0992, over 18959.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01131, ecapa_loss=0.0001895, whisper_loss=0.09284, over 3907986.32 frames. ], batch size: 72, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:39:53,392 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.28 vs. limit=22.5 2024-08-11 20:39:58,061 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 23 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-11 20:40:02,161 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 31 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-11 20:40:03,673 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-11 20:40:11,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1278580.0, ans=0.07 2024-08-11 20:40:14,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1278580.0, ans=0.125 2024-08-11 20:40:29,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1278680.0, ans=0.0 2024-08-11 20:40:36,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1278680.0, ans=0.125 2024-08-11 20:40:46,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1278780.0, ans=0.125 2024-08-11 20:40:55,618 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 20:41:00,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1278780.0, ans=0.125 2024-08-11 20:41:03,516 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 11950, loss[loss=0.096, beats_loss=0.0108, ecapa_loss=0.0001844, whisper_loss=0.08336, over 21966.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01118, ecapa_loss=0.0001915, whisper_loss=0.09269, over 3847068.01 frames. ], batch size: 89, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:41:16,271 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2024-08-11 20:41:19,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1278980.0, ans=0.125 2024-08-11 20:41:34,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1278980.0, ans=0.1 2024-08-11 20:41:37,077 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.570e+01 2.836e+01 3.237e+01 6.228e+01, threshold=5.672e+01, percent-clipped=0.0 2024-08-11 20:41:37,993 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 20:41:43,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1279080.0, ans=0.035 2024-08-11 20:41:45,699 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.79 vs. limit=15.0 2024-08-11 20:41:52,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1279080.0, ans=0.0 2024-08-11 20:41:54,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1279080.0, ans=0.125 2024-08-11 20:41:56,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1279180.0, ans=0.1 2024-08-11 20:42:06,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1279180.0, ans=0.125 2024-08-11 20:42:29,018 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.21 vs. limit=15.0 2024-08-11 20:42:30,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1279280.0, ans=0.0 2024-08-11 20:42:33,182 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12000, loss[loss=0.09262, beats_loss=0.01228, ecapa_loss=0.0001618, whisper_loss=0.07872, over 20571.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01122, ecapa_loss=0.00019, whisper_loss=0.09234, over 3858797.44 frames. ], batch size: 79, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:42:33,182 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 20:43:16,128 INFO [train_multi_KD3.py:1149] (2/4) Epoch 9, validation on ASR_libri: loss=0.2562, beats_loss=0, ecapa_loss=0.0006123, whisper_loss=0.25, over 922467.00 frames. 2024-08-11 20:43:35,255 INFO [train_multi_KD3.py:1149] (2/4) Epoch 9, validation on SV_voxceleb1: loss=0.005094, beats_loss=0, ecapa_loss=0.0005094, whisper_loss=0, over 939242.00 frames. 2024-08-11 20:44:20,608 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.0633, 3.7831, 3.1964, 3.5604], device='cuda:2') 2024-08-11 20:45:30,602 INFO [train_multi_KD3.py:1149] (2/4) Epoch 9, validation on AT_audioset: loss=0.02487, beats_loss=0.02487, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 20:45:30,606 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 20:45:39,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1279380.0, ans=10.0 2024-08-11 20:45:53,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1279480.0, ans=0.0 2024-08-11 20:45:58,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1279480.0, ans=0.0 2024-08-11 20:46:09,957 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2024-08-11 20:46:26,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1279680.0, ans=0.125 2024-08-11 20:46:26,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1279680.0, ans=0.125 2024-08-11 20:46:37,186 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 15 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-11 20:46:39,732 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.93 vs. limit=22.5 2024-08-11 20:46:59,081 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12050, loss[loss=0.1322, beats_loss=0.008172, ecapa_loss=0.0002272, whisper_loss=0.1218, over 14402.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01132, ecapa_loss=0.0001901, whisper_loss=0.09134, over 3838301.52 frames. ], batch size: 55, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:47:14,429 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.62 vs. limit=15.0 2024-08-11 20:47:16,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1279980.0, ans=0.125 2024-08-11 20:47:26,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1279980.0, ans=0.125 2024-08-11 20:47:32,931 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.178e+01 2.712e+01 3.113e+01 3.609e+01 6.588e+01, threshold=6.227e+01, percent-clipped=3.0 2024-08-11 20:47:42,792 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.50 vs. limit=6.0 2024-08-11 20:47:49,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1280080.0, ans=0.125 2024-08-11 20:48:27,224 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12100, loss[loss=0.1095, beats_loss=0.0115, ecapa_loss=0.0001724, whisper_loss=0.09629, over 21785.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01133, ecapa_loss=0.0001889, whisper_loss=0.09149, over 3854533.72 frames. ], batch size: 86, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:48:29,603 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 20:48:32,583 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.983e+00 2024-08-11 20:48:50,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1280480.0, ans=0.125 2024-08-11 20:48:52,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1280480.0, ans=0.125 2024-08-11 20:48:57,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1280480.0, ans=0.0 2024-08-11 20:49:02,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1280580.0, ans=0.125 2024-08-11 20:49:31,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1280680.0, ans=0.0 2024-08-11 20:49:36,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1280780.0, ans=0.0 2024-08-11 20:49:41,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1280780.0, ans=0.0 2024-08-11 20:49:55,014 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12150, loss[loss=0.09994, beats_loss=0.01158, ecapa_loss=0.000183, whisper_loss=0.08652, over 16687.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01125, ecapa_loss=0.0001887, whisper_loss=0.09177, over 3869488.30 frames. ], batch size: 69, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:50:19,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1280980.0, ans=0.2 2024-08-11 20:50:26,563 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.646e+01 3.020e+01 3.424e+01 5.278e+01, threshold=6.041e+01, percent-clipped=0.0 2024-08-11 20:50:46,948 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.47 vs. limit=10.0 2024-08-11 20:50:53,610 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 20:51:03,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1281280.0, ans=0.2 2024-08-11 20:51:18,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1281380.0, ans=0.04949747468305833 2024-08-11 20:51:19,631 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12200, loss[loss=0.1186, beats_loss=0.01072, ecapa_loss=0.0002019, whisper_loss=0.1059, over 22425.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01121, ecapa_loss=0.0001883, whisper_loss=0.09275, over 3869446.22 frames. ], batch size: 91, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:51:44,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1281480.0, ans=0.125 2024-08-11 20:52:43,690 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12250, loss[loss=0.09213, beats_loss=0.01148, ecapa_loss=0.0001878, whisper_loss=0.07877, over 21539.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01119, ecapa_loss=0.0001881, whisper_loss=0.09207, over 3866224.36 frames. ], batch size: 88, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:52:46,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1281880.0, ans=0.125 2024-08-11 20:52:48,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1281880.0, ans=0.1 2024-08-11 20:53:11,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1281980.0, ans=0.125 2024-08-11 20:53:16,093 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.045e+01 2.578e+01 2.932e+01 3.420e+01 1.649e+02, threshold=5.864e+01, percent-clipped=1.0 2024-08-11 20:53:27,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1282080.0, ans=0.125 2024-08-11 20:53:29,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1282080.0, ans=0.125 2024-08-11 20:53:38,556 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-11 20:53:45,192 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 20:54:08,716 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12300, loss[loss=0.09941, beats_loss=0.01404, ecapa_loss=0.0001844, whisper_loss=0.08353, over 18023.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01126, ecapa_loss=0.0001882, whisper_loss=0.0919, over 3876650.04 frames. ], batch size: 75, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:54:19,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1282380.0, ans=0.2 2024-08-11 20:54:26,681 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 11 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 20:54:53,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1282580.0, ans=0.0 2024-08-11 20:55:15,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1282680.0, ans=0.125 2024-08-11 20:55:21,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1282780.0, ans=0.125 2024-08-11 20:55:21,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1282780.0, ans=0.04949747468305833 2024-08-11 20:55:35,018 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12350, loss[loss=0.1245, beats_loss=0.01112, ecapa_loss=0.0001971, whisper_loss=0.1114, over 19936.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01118, ecapa_loss=0.0001908, whisper_loss=0.09233, over 3860765.11 frames. ], batch size: 80, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:56:01,623 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.73 vs. limit=12.0 2024-08-11 20:56:06,579 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 2.524e+01 2.954e+01 3.299e+01 5.655e+01, threshold=5.908e+01, percent-clipped=0.0 2024-08-11 20:56:30,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1283180.0, ans=0.2 2024-08-11 20:56:36,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1283180.0, ans=0.125 2024-08-11 20:56:40,872 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 20:56:54,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1283280.0, ans=0.1 2024-08-11 20:56:55,364 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 20:57:01,288 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12400, loss[loss=0.08901, beats_loss=0.0101, ecapa_loss=0.0002124, whisper_loss=0.07679, over 19426.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01116, ecapa_loss=0.0001908, whisper_loss=0.09253, over 3868921.76 frames. ], batch size: 80, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:57:16,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1283480.0, ans=0.5 2024-08-11 20:58:13,333 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2024-08-11 20:58:21,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1283780.0, ans=0.125 2024-08-11 20:58:25,687 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12450, loss[loss=0.09592, beats_loss=0.01212, ecapa_loss=0.000188, whisper_loss=0.08192, over 22336.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01124, ecapa_loss=0.0001896, whisper_loss=0.0917, over 3859471.24 frames. ], batch size: 92, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:58:46,437 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 20:58:46,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1283980.0, ans=0.125 2024-08-11 20:58:56,127 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.629e+01 2.973e+01 3.425e+01 5.618e+01, threshold=5.946e+01, percent-clipped=0.0 2024-08-11 20:58:58,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1284080.0, ans=0.125 2024-08-11 20:58:59,585 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 33 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 20:59:48,213 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12500, loss[loss=0.1228, beats_loss=0.007345, ecapa_loss=0.0002215, whisper_loss=0.1132, over 15600.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01118, ecapa_loss=0.0001896, whisper_loss=0.09221, over 3834538.06 frames. ], batch size: 60, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:59:49,953 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 20:59:53,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1284380.0, ans=0.2 2024-08-11 20:59:59,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1284380.0, ans=0.1 2024-08-11 21:00:01,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1284380.0, ans=0.0 2024-08-11 21:00:03,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1284380.0, ans=0.125 2024-08-11 21:00:05,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1284480.0, ans=0.0 2024-08-11 21:00:09,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1284480.0, ans=0.125 2024-08-11 21:00:16,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1284480.0, ans=0.125 2024-08-11 21:00:19,998 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2024-08-11 21:00:27,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1284580.0, ans=0.1 2024-08-11 21:00:36,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1284580.0, ans=0.035 2024-08-11 21:00:47,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1284680.0, ans=0.2 2024-08-11 21:00:54,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1284780.0, ans=0.125 2024-08-11 21:01:05,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1284780.0, ans=0.2 2024-08-11 21:01:14,445 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12550, loss[loss=0.1068, beats_loss=0.01288, ecapa_loss=0.0001651, whisper_loss=0.09226, over 23174.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01125, ecapa_loss=0.0001882, whisper_loss=0.0924, over 3896661.49 frames. ], batch size: 89, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:01:20,925 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.35 vs. limit=15.0 2024-08-11 21:01:44,951 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.709e+01 3.086e+01 3.503e+01 6.566e+01, threshold=6.173e+01, percent-clipped=1.0 2024-08-11 21:01:50,352 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-11 21:01:52,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1285080.0, ans=0.125 2024-08-11 21:02:06,217 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=12.0 2024-08-11 21:02:32,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1285280.0, ans=0.125 2024-08-11 21:02:35,901 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12600, loss[loss=0.1064, beats_loss=0.009737, ecapa_loss=0.0001865, whisper_loss=0.0948, over 17874.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01126, ecapa_loss=0.000189, whisper_loss=0.09332, over 3913928.17 frames. ], batch size: 71, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:02:40,222 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.53 vs. limit=15.0 2024-08-11 21:02:51,612 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.45 vs. limit=15.0 2024-08-11 21:03:27,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1285680.0, ans=0.125 2024-08-11 21:03:57,189 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12650, loss[loss=0.1132, beats_loss=0.0117, ecapa_loss=0.0001975, whisper_loss=0.09954, over 20993.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01134, ecapa_loss=0.0001899, whisper_loss=0.09281, over 3905669.79 frames. ], batch size: 83, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:04:22,828 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 21:04:24,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1285980.0, ans=0.09899494936611666 2024-08-11 21:04:31,293 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.568e+01 2.843e+01 3.370e+01 6.340e+01, threshold=5.685e+01, percent-clipped=1.0 2024-08-11 21:04:51,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1286180.0, ans=0.2 2024-08-11 21:04:54,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1286180.0, ans=0.2 2024-08-11 21:04:59,665 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 21:05:25,561 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12700, loss[loss=0.1146, beats_loss=0.01005, ecapa_loss=0.0001931, whisper_loss=0.1026, over 23209.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01124, ecapa_loss=0.0001896, whisper_loss=0.09346, over 3898020.19 frames. ], batch size: 93, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:05:52,430 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2024-08-11 21:06:02,841 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 20 from LS+wenet, 27 from Vox, 47 fro AS 2024-08-11 21:06:40,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1286780.0, ans=0.125 2024-08-11 21:06:41,003 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2024-08-11 21:06:47,142 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12750, loss[loss=0.08229, beats_loss=0.0136, ecapa_loss=0.0001651, whisper_loss=0.06704, over 21982.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01133, ecapa_loss=0.0001898, whisper_loss=0.09292, over 3908767.51 frames. ], batch size: 88, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:07:01,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1286880.0, ans=0.0 2024-08-11 21:07:16,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1286980.0, ans=0.0 2024-08-11 21:07:18,096 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 21:07:19,457 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.648e+01 3.001e+01 3.436e+01 1.023e+02, threshold=6.002e+01, percent-clipped=1.0 2024-08-11 21:07:22,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1287080.0, ans=0.1 2024-08-11 21:07:27,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1287080.0, ans=0.5 2024-08-11 21:07:36,882 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.62 vs. limit=10.0 2024-08-11 21:07:40,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1287180.0, ans=0.125 2024-08-11 21:08:06,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1287280.0, ans=0.1 2024-08-11 21:08:15,114 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.99 vs. limit=22.5 2024-08-11 21:08:16,020 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12800, loss[loss=0.1132, beats_loss=0.01333, ecapa_loss=0.0001829, whisper_loss=0.09802, over 21946.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01127, ecapa_loss=0.0001913, whisper_loss=0.09324, over 3881690.83 frames. ], batch size: 88, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:08:33,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1287480.0, ans=0.125 2024-08-11 21:08:50,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1287580.0, ans=0.125 2024-08-11 21:09:05,868 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 21:09:22,300 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-11 21:09:22,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1287780.0, ans=0.125 2024-08-11 21:09:36,612 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12850, loss[loss=0.1068, beats_loss=0.01485, ecapa_loss=0.000136, whisper_loss=0.09063, over 22501.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0114, ecapa_loss=0.0001897, whisper_loss=0.09242, over 3864711.34 frames. ], batch size: 88, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:09:44,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1287880.0, ans=0.1 2024-08-11 21:09:57,836 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 21:10:09,800 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.541e+01 2.885e+01 3.297e+01 4.788e+01, threshold=5.770e+01, percent-clipped=0.0 2024-08-11 21:10:37,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1288180.0, ans=0.0 2024-08-11 21:10:41,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1288180.0, ans=0.125 2024-08-11 21:10:44,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1288280.0, ans=0.04949747468305833 2024-08-11 21:10:52,687 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.18 vs. limit=10.0 2024-08-11 21:11:00,404 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12900, loss[loss=0.09029, beats_loss=0.01473, ecapa_loss=0.0002051, whisper_loss=0.0735, over 22423.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01142, ecapa_loss=0.0001903, whisper_loss=0.09176, over 3845263.15 frames. ], batch size: 93, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:11:00,551 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 21:11:11,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1288380.0, ans=0.125 2024-08-11 21:11:17,597 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 21:11:19,362 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-11 21:11:24,121 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 21:11:28,826 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.743e-01 2024-08-11 21:11:50,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=1288680.0, ans=0.02 2024-08-11 21:11:59,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1288680.0, ans=0.125 2024-08-11 21:12:18,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1288780.0, ans=0.0 2024-08-11 21:12:20,468 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-11 21:12:21,636 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 12950, loss[loss=0.1181, beats_loss=0.008124, ecapa_loss=0.0002051, whisper_loss=0.1079, over 15240.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01131, ecapa_loss=0.0001904, whisper_loss=0.09202, over 3842977.32 frames. ], batch size: 58, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:12:23,771 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-11 21:12:35,675 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 21:12:44,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1288980.0, ans=0.125 2024-08-11 21:12:54,620 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.715e+01 3.125e+01 3.606e+01 5.827e+01, threshold=6.249e+01, percent-clipped=1.0 2024-08-11 21:12:56,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1289080.0, ans=0.0 2024-08-11 21:13:45,781 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13000, loss[loss=0.1209, beats_loss=0.008639, ecapa_loss=0.000185, whisper_loss=0.1104, over 14183.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0113, ecapa_loss=0.0001889, whisper_loss=0.09276, over 3859199.17 frames. ], batch size: 53, lr: 6.87e-03, grad_scale: 2.305843009213694e+18 2024-08-11 21:13:47,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1289380.0, ans=0.125 2024-08-11 21:14:00,384 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 21:14:09,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1289480.0, ans=0.1 2024-08-11 21:14:12,827 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=15.0 2024-08-11 21:14:36,083 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 21:14:36,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1289680.0, ans=0.2 2024-08-11 21:14:41,988 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 21:14:44,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1289680.0, ans=0.1 2024-08-11 21:14:47,428 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 21:14:47,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1289680.0, ans=0.0 2024-08-11 21:15:02,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1289780.0, ans=0.125 2024-08-11 21:15:10,632 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 23 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 21:15:12,090 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13050, loss[loss=0.09517, beats_loss=0.01211, ecapa_loss=0.00018, whisper_loss=0.08126, over 20814.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01129, ecapa_loss=0.000189, whisper_loss=0.09273, over 3844965.38 frames. ], batch size: 82, lr: 6.86e-03, grad_scale: 2.305843009213694e+18 2024-08-11 21:15:13,536 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 21:15:15,353 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 21:15:24,999 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 21:15:26,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1289980.0, ans=0.125 2024-08-11 21:15:36,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1289980.0, ans=0.125 2024-08-11 21:15:43,474 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.491e+01 2.763e+01 3.152e+01 5.442e+01, threshold=5.527e+01, percent-clipped=0.0 2024-08-11 21:15:50,129 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-11 21:15:50,680 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 21:15:50,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1290080.0, ans=0.2 2024-08-11 21:15:56,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1290080.0, ans=0.0 2024-08-11 21:16:02,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1290180.0, ans=0.0 2024-08-11 21:16:22,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1290280.0, ans=0.125 2024-08-11 21:16:26,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1290280.0, ans=0.125 2024-08-11 21:16:29,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1290280.0, ans=0.125 2024-08-11 21:16:34,750 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13100, loss[loss=0.1161, beats_loss=0.01237, ecapa_loss=0.0001862, whisper_loss=0.1019, over 19782.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01128, ecapa_loss=0.0001896, whisper_loss=0.0919, over 3847026.96 frames. ], batch size: 85, lr: 6.86e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:16:35,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1290380.0, ans=0.125 2024-08-11 21:16:44,994 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=12.0 2024-08-11 21:17:15,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1290580.0, ans=0.0 2024-08-11 21:17:28,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1290680.0, ans=0.125 2024-08-11 21:17:39,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1290680.0, ans=0.125 2024-08-11 21:17:39,405 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-08-11 21:17:42,544 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 21:17:48,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1290780.0, ans=0.5 2024-08-11 21:17:51,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1290780.0, ans=0.0 2024-08-11 21:17:59,106 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.76 vs. limit=15.0 2024-08-11 21:18:02,067 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13150, loss[loss=0.09543, beats_loss=0.01121, ecapa_loss=0.0001734, whisper_loss=0.08248, over 16664.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01122, ecapa_loss=0.0001906, whisper_loss=0.09307, over 3874584.76 frames. ], batch size: 62, lr: 6.86e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:18:02,238 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 21:18:09,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1290880.0, ans=0.125 2024-08-11 21:18:21,222 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.21 vs. limit=22.5 2024-08-11 21:18:34,680 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 21:18:36,006 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.521e+01 2.887e+01 3.350e+01 6.017e+01, threshold=5.775e+01, percent-clipped=1.0 2024-08-11 21:18:38,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1291080.0, ans=0.125 2024-08-11 21:18:38,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1291080.0, ans=0.05 2024-08-11 21:19:00,183 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 11 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 21:19:07,980 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 18 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 21:19:15,129 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-08-11 21:19:24,851 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13200, loss[loss=0.118, beats_loss=0.01123, ecapa_loss=0.0002152, whisper_loss=0.1047, over 21849.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01124, ecapa_loss=0.0001904, whisper_loss=0.0929, over 3836331.33 frames. ], batch size: 91, lr: 6.86e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:19:27,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2024-08-11 21:19:29,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1291380.0, ans=0.1 2024-08-11 21:19:30,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1291380.0, ans=0.0 2024-08-11 21:19:38,033 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.389e+00 2024-08-11 21:19:46,932 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.22 vs. limit=15.0 2024-08-11 21:19:58,744 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 33 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 21:20:03,636 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-11 21:20:06,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1291580.0, ans=0.05 2024-08-11 21:20:07,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1291580.0, ans=0.125 2024-08-11 21:20:39,333 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-11 21:20:40,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1291780.0, ans=0.0 2024-08-11 21:20:41,953 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 21:20:42,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1291780.0, ans=0.125 2024-08-11 21:20:47,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1291880.0, ans=0.125 2024-08-11 21:20:48,420 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13250, loss[loss=0.1043, beats_loss=0.01119, ecapa_loss=0.0001534, whisper_loss=0.09154, over 22912.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01116, ecapa_loss=0.0001931, whisper_loss=0.09293, over 3833390.22 frames. ], batch size: 89, lr: 6.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:20:49,335 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.02 vs. limit=15.0 2024-08-11 21:21:03,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1291980.0, ans=0.0 2024-08-11 21:21:21,283 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.554e+01 3.002e+01 3.444e+01 4.623e+01, threshold=6.004e+01, percent-clipped=0.0 2024-08-11 21:21:21,534 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 18 from LS+wenet, 32 from Vox, 29 fro AS 2024-08-11 21:21:58,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1292280.0, ans=15.0 2024-08-11 21:22:05,720 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13300, loss[loss=0.1139, beats_loss=0.009108, ecapa_loss=0.0001749, whisper_loss=0.103, over 20715.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01112, ecapa_loss=0.0001908, whisper_loss=0.09331, over 3828450.80 frames. ], batch size: 79, lr: 6.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:22:10,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1292380.0, ans=0.09899494936611666 2024-08-11 21:22:16,318 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 21:22:22,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1292480.0, ans=0.125 2024-08-11 21:22:46,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1292580.0, ans=0.125 2024-08-11 21:22:47,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1292580.0, ans=0.0 2024-08-11 21:22:51,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1292580.0, ans=0.0 2024-08-11 21:23:12,519 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 21:23:16,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1292780.0, ans=0.125 2024-08-11 21:23:18,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1292780.0, ans=0.125 2024-08-11 21:23:25,060 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13350, loss[loss=0.1074, beats_loss=0.01088, ecapa_loss=0.0002367, whisper_loss=0.09412, over 15703.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01117, ecapa_loss=0.0001902, whisper_loss=0.09279, over 3849425.86 frames. ], batch size: 64, lr: 6.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:23:25,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2024-08-11 21:23:27,908 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-11 21:23:28,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1292880.0, ans=0.125 2024-08-11 21:23:29,361 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 37 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 21:23:30,870 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 21:23:41,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1292980.0, ans=0.0 2024-08-11 21:23:50,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1292980.0, ans=10.0 2024-08-11 21:23:51,921 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 21:23:55,591 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.962e+01 2.575e+01 2.972e+01 3.296e+01 7.873e+01, threshold=5.944e+01, percent-clipped=3.0 2024-08-11 21:24:10,288 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-11 21:24:19,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1293180.0, ans=0.0 2024-08-11 21:24:33,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1293280.0, ans=0.0 2024-08-11 21:24:37,108 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13400, loss[loss=0.09501, beats_loss=0.01172, ecapa_loss=0.0001785, whisper_loss=0.08151, over 16209.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01122, ecapa_loss=0.0001904, whisper_loss=0.09247, over 3841803.26 frames. ], batch size: 64, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:24:43,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1293380.0, ans=0.0 2024-08-11 21:25:25,134 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2024-08-11 21:25:31,411 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-11 21:25:46,669 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13450, loss[loss=0.09776, beats_loss=0.01269, ecapa_loss=0.000222, whisper_loss=0.08286, over 19309.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01119, ecapa_loss=0.0001911, whisper_loss=0.09336, over 3860906.79 frames. ], batch size: 81, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:25:57,508 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 21:25:59,457 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2024-08-11 21:26:04,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1293980.0, ans=0.0 2024-08-11 21:26:14,043 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 31 from Vox, 26 fro AS 2024-08-11 21:26:15,018 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.605e+01 2.918e+01 3.272e+01 4.452e+01, threshold=5.836e+01, percent-clipped=0.0 2024-08-11 21:26:26,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1294180.0, ans=0.125 2024-08-11 21:26:34,508 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 21:26:42,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1294280.0, ans=0.0 2024-08-11 21:26:44,693 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2024-08-11 21:26:55,022 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13500, loss[loss=0.07865, beats_loss=0.01164, ecapa_loss=0.0001542, whisper_loss=0.06546, over 20272.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01113, ecapa_loss=0.0001908, whisper_loss=0.09353, over 3866269.60 frames. ], batch size: 79, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:27:12,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1294480.0, ans=0.0 2024-08-11 21:27:20,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1294480.0, ans=10.0 2024-08-11 21:27:20,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1294480.0, ans=0.0 2024-08-11 21:27:38,445 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2024-08-11 21:27:57,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1294780.0, ans=0.0 2024-08-11 21:28:00,358 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-11 21:28:01,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1294780.0, ans=0.125 2024-08-11 21:28:03,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13550, loss[loss=0.1057, beats_loss=0.01265, ecapa_loss=0.0001761, whisper_loss=0.09125, over 22247.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01114, ecapa_loss=0.0001899, whisper_loss=0.09415, over 3877037.24 frames. ], batch size: 92, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:28:05,922 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 21:28:06,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1294880.0, ans=0.0 2024-08-11 21:28:30,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1295080.0, ans=0.2 2024-08-11 21:28:32,380 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.573e+01 2.919e+01 3.325e+01 1.633e+02, threshold=5.839e+01, percent-clipped=1.0 2024-08-11 21:28:37,308 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.99 vs. limit=10.0 2024-08-11 21:28:51,206 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.39 vs. limit=6.0 2024-08-11 21:29:05,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1295280.0, ans=0.125 2024-08-11 21:29:12,019 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13600, loss[loss=0.09749, beats_loss=0.01259, ecapa_loss=0.0001364, whisper_loss=0.08354, over 16642.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01114, ecapa_loss=0.0001889, whisper_loss=0.09405, over 3860908.91 frames. ], batch size: 61, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:29:19,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1295380.0, ans=0.125 2024-08-11 21:29:28,652 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 21:29:30,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1295480.0, ans=0.025 2024-08-11 21:29:31,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1295480.0, ans=0.0 2024-08-11 21:29:56,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1295680.0, ans=0.5 2024-08-11 21:30:01,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1295680.0, ans=0.125 2024-08-11 21:30:13,408 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 21:30:15,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1295780.0, ans=0.05 2024-08-11 21:30:19,943 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2024-08-11 21:30:20,342 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13650, loss[loss=0.1095, beats_loss=0.0117, ecapa_loss=0.0001996, whisper_loss=0.0958, over 21852.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01123, ecapa_loss=0.0001889, whisper_loss=0.09369, over 3877858.13 frames. ], batch size: 91, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:30:22,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1295880.0, ans=0.2 2024-08-11 21:30:24,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1295880.0, ans=0.125 2024-08-11 21:30:31,480 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-11 21:30:48,564 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.503e+01 2.904e+01 3.318e+01 5.006e+01, threshold=5.809e+01, percent-clipped=0.0 2024-08-11 21:31:01,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1296180.0, ans=0.125 2024-08-11 21:31:23,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1296280.0, ans=0.0 2024-08-11 21:31:28,578 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13700, loss[loss=0.1203, beats_loss=0.009607, ecapa_loss=0.0001828, whisper_loss=0.1089, over 21918.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01126, ecapa_loss=0.0001894, whisper_loss=0.09338, over 3864963.85 frames. ], batch size: 83, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:31:29,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1296380.0, ans=0.07 2024-08-11 21:31:29,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1296380.0, ans=0.0 2024-08-11 21:31:38,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1296380.0, ans=0.125 2024-08-11 21:32:10,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1296680.0, ans=0.125 2024-08-11 21:32:32,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1296780.0, ans=0.125 2024-08-11 21:32:34,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1296780.0, ans=0.0 2024-08-11 21:32:38,436 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13750, loss[loss=0.08461, beats_loss=0.0111, ecapa_loss=0.0002426, whisper_loss=0.07108, over 16554.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01122, ecapa_loss=0.0001898, whisper_loss=0.09371, over 3870300.99 frames. ], batch size: 72, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:32:41,805 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.600e+00 2024-08-11 21:32:58,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1296980.0, ans=0.125 2024-08-11 21:33:05,328 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 21:33:07,790 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.561e+01 2.855e+01 3.257e+01 5.078e+01, threshold=5.711e+01, percent-clipped=0.0 2024-08-11 21:33:16,626 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 17 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 21:33:21,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1297180.0, ans=10.0 2024-08-11 21:33:29,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1297180.0, ans=0.0 2024-08-11 21:33:37,973 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-11 21:33:48,639 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13800, loss[loss=0.1079, beats_loss=0.00898, ecapa_loss=0.0001907, whisper_loss=0.09698, over 18601.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01122, ecapa_loss=0.0001888, whisper_loss=0.09324, over 3865379.79 frames. ], batch size: 77, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:34:00,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1297380.0, ans=0.0 2024-08-11 21:34:02,509 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 21:34:11,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1297480.0, ans=0.0 2024-08-11 21:34:16,655 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 21:34:23,620 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-11 21:34:43,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1297780.0, ans=0.1 2024-08-11 21:34:52,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1297780.0, ans=0.07 2024-08-11 21:34:57,469 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13850, loss[loss=0.1011, beats_loss=0.009462, ecapa_loss=0.0001983, whisper_loss=0.08961, over 19682.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01118, ecapa_loss=0.0001884, whisper_loss=0.09299, over 3853547.53 frames. ], batch size: 77, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:35:07,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=1297880.0, ans=12.0 2024-08-11 21:35:14,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1297980.0, ans=0.2 2024-08-11 21:35:15,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1297980.0, ans=0.125 2024-08-11 21:35:25,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1298080.0, ans=0.0 2024-08-11 21:35:26,192 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.750e+01 3.088e+01 3.546e+01 6.102e+01, threshold=6.176e+01, percent-clipped=2.0 2024-08-11 21:35:26,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1298080.0, ans=0.125 2024-08-11 21:35:27,660 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 20 from LS+wenet, 28 from Vox, 48 fro AS 2024-08-11 21:35:37,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1298180.0, ans=0.0 2024-08-11 21:35:45,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1298180.0, ans=0.1 2024-08-11 21:35:52,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1298280.0, ans=0.0 2024-08-11 21:36:01,879 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 32 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 21:36:03,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1298280.0, ans=0.125 2024-08-11 21:36:05,931 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13900, loss[loss=0.09539, beats_loss=0.01187, ecapa_loss=0.0001903, whisper_loss=0.08162, over 21720.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01114, ecapa_loss=0.0001894, whisper_loss=0.09335, over 3884798.39 frames. ], batch size: 88, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:36:17,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1298380.0, ans=0.125 2024-08-11 21:36:28,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1298480.0, ans=0.1 2024-08-11 21:36:48,821 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 21:37:08,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1298780.0, ans=0.125 2024-08-11 21:37:12,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1298780.0, ans=0.125 2024-08-11 21:37:14,938 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 13950, loss[loss=0.08392, beats_loss=0.01431, ecapa_loss=0.0001379, whisper_loss=0.06823, over 13658.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.0111, ecapa_loss=0.0001897, whisper_loss=0.09412, over 3881895.48 frames. ], batch size: 53, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:37:36,345 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 21:37:43,184 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.143e+01 2.710e+01 3.048e+01 3.326e+01 4.854e+01, threshold=6.095e+01, percent-clipped=0.0 2024-08-11 21:37:45,346 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-08-11 21:37:51,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1299080.0, ans=0.0 2024-08-11 21:38:23,843 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 14000, loss[loss=0.08515, beats_loss=0.0131, ecapa_loss=0.000149, whisper_loss=0.07056, over 20842.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01109, ecapa_loss=0.0001882, whisper_loss=0.09448, over 3889380.55 frames. ], batch size: 84, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:38:29,963 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2024-08-11 21:38:53,013 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.71 vs. limit=22.5 2024-08-11 21:38:57,162 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2024-08-11 21:38:58,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1299580.0, ans=0.0 2024-08-11 21:38:59,026 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 21:39:09,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1299680.0, ans=0.125 2024-08-11 21:39:09,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2024-08-11 21:39:16,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1299680.0, ans=0.0 2024-08-11 21:39:26,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1299780.0, ans=0.0 2024-08-11 21:39:29,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1299780.0, ans=0.125 2024-08-11 21:39:32,778 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 14050, loss[loss=0.09822, beats_loss=0.01337, ecapa_loss=0.0001681, whisper_loss=0.08317, over 22032.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01115, ecapa_loss=0.0001871, whisper_loss=0.09387, over 3875959.69 frames. ], batch size: 92, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:39:34,297 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 21:39:36,275 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.56 vs. limit=22.5 2024-08-11 21:39:39,113 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.94 vs. limit=10.0 2024-08-11 21:39:40,612 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-11 21:39:55,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1299980.0, ans=0.125 2024-08-11 21:40:01,049 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.630e+01 2.929e+01 3.311e+01 9.104e+01, threshold=5.859e+01, percent-clipped=1.0 2024-08-11 21:40:01,295 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 21:40:18,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1300180.0, ans=0.2 2024-08-11 21:40:26,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1300280.0, ans=0.2 2024-08-11 21:40:32,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1300280.0, ans=0.0 2024-08-11 21:40:34,323 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.57 vs. limit=15.0 2024-08-11 21:40:41,591 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 14100, loss[loss=0.1069, beats_loss=0.01088, ecapa_loss=0.0001701, whisper_loss=0.09434, over 16740.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0112, ecapa_loss=0.0001873, whisper_loss=0.09289, over 3832382.57 frames. ], batch size: 63, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:40:43,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1300380.0, ans=0.2 2024-08-11 21:40:50,674 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.20 vs. limit=15.0 2024-08-11 21:40:51,199 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 21:40:56,988 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 21:40:58,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1300480.0, ans=0.0 2024-08-11 21:40:59,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1300480.0, ans=0.1 2024-08-11 21:41:01,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1300480.0, ans=0.125 2024-08-11 21:41:09,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1300580.0, ans=0.125 2024-08-11 21:41:23,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1300680.0, ans=0.125 2024-08-11 21:41:26,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1300680.0, ans=0.1 2024-08-11 21:41:27,190 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 21:41:35,342 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.42 vs. limit=15.0 2024-08-11 21:41:41,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1300780.0, ans=0.125 2024-08-11 21:41:47,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1300780.0, ans=0.125 2024-08-11 21:41:50,723 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 14150, loss[loss=0.1105, beats_loss=0.0109, ecapa_loss=0.0002049, whisper_loss=0.09752, over 17665.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01117, ecapa_loss=0.0001889, whisper_loss=0.09331, over 3850415.83 frames. ], batch size: 69, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:41:54,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1300880.0, ans=0.0 2024-08-11 21:42:00,746 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 20 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-11 21:42:13,511 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.84 vs. limit=15.0 2024-08-11 21:42:19,656 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.615e+01 2.850e+01 3.033e+01 5.082e+01, threshold=5.700e+01, percent-clipped=0.0 2024-08-11 21:42:37,584 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-11 21:42:41,960 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 21:42:43,966 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2024-08-11 21:42:45,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1301280.0, ans=0.1 2024-08-11 21:42:46,216 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 18 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 21:42:59,572 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 14200, loss[loss=0.09986, beats_loss=0.01238, ecapa_loss=0.0002208, whisper_loss=0.08528, over 14064.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01119, ecapa_loss=0.0001881, whisper_loss=0.09292, over 3837870.73 frames. ], batch size: 57, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:43:01,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1301380.0, ans=0.125 2024-08-11 21:43:05,605 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.16 vs. limit=10.0 2024-08-11 21:43:06,318 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 13 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 21:43:26,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1301580.0, ans=0.0 2024-08-11 21:43:38,030 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-11 21:43:45,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1301680.0, ans=0.0 2024-08-11 21:43:47,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1301680.0, ans=0.125 2024-08-11 21:44:08,084 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 14250, loss[loss=0.1111, beats_loss=0.01165, ecapa_loss=0.0001394, whisper_loss=0.09811, over 18706.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01115, ecapa_loss=0.0001865, whisper_loss=0.09292, over 3843284.26 frames. ], batch size: 73, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:44:18,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1301880.0, ans=0.2 2024-08-11 21:44:33,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1301980.0, ans=0.0 2024-08-11 21:44:37,510 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.747e+01 3.033e+01 3.629e+01 5.919e+01, threshold=6.067e+01, percent-clipped=2.0 2024-08-11 21:45:18,001 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 14300, loss[loss=0.0812, beats_loss=0.01321, ecapa_loss=0.0001805, whisper_loss=0.06619, over 15593.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01118, ecapa_loss=0.0001854, whisper_loss=0.093, over 3839357.57 frames. ], batch size: 63, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:45:18,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1302380.0, ans=0.1 2024-08-11 21:45:18,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1302380.0, ans=0.125 2024-08-11 21:45:37,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1302480.0, ans=0.0 2024-08-11 21:45:41,826 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 21:45:47,814 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 21:45:52,935 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 21:45:53,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1302580.0, ans=0.025 2024-08-11 21:46:02,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1302680.0, ans=0.5 2024-08-11 21:46:13,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1302780.0, ans=0.0 2024-08-11 21:46:14,827 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-11 21:46:17,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1302780.0, ans=10.0 2024-08-11 21:46:27,400 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 14350, loss[loss=0.09715, beats_loss=0.0122, ecapa_loss=0.0002328, whisper_loss=0.08262, over 15961.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01122, ecapa_loss=0.0001852, whisper_loss=0.09289, over 3860343.45 frames. ], batch size: 67, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:46:32,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1302880.0, ans=0.0 2024-08-11 21:46:37,775 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2024-08-11 21:46:54,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1303080.0, ans=0.125 2024-08-11 21:46:56,029 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.719e+01 2.981e+01 3.464e+01 5.321e+01, threshold=5.963e+01, percent-clipped=0.0 2024-08-11 21:47:19,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1303180.0, ans=0.125 2024-08-11 21:47:31,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1303280.0, ans=0.1 2024-08-11 21:47:37,024 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 14400, loss[loss=0.105, beats_loss=0.01266, ecapa_loss=0.0002185, whisper_loss=0.09012, over 17577.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0112, ecapa_loss=0.0001877, whisper_loss=0.09224, over 3870225.03 frames. ], batch size: 74, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:47:39,544 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.40 vs. limit=15.0 2024-08-11 21:47:53,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1303480.0, ans=0.1 2024-08-11 21:47:54,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1303480.0, ans=0.125 2024-08-11 21:48:12,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1303580.0, ans=0.1 2024-08-11 21:48:16,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1303580.0, ans=0.125 2024-08-11 21:48:22,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1303680.0, ans=0.0 2024-08-11 21:48:22,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1303680.0, ans=0.125 2024-08-11 21:48:34,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1303780.0, ans=0.125 2024-08-11 21:48:36,768 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-11 21:48:37,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1303780.0, ans=0.2 2024-08-11 21:48:38,701 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-08-11 21:48:45,911 INFO [train_multi_KD3.py:1116] (2/4) Epoch 9, batch 14450, loss[loss=0.08606, beats_loss=0.01251, ecapa_loss=0.0001705, whisper_loss=0.07184, over 22356.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01115, ecapa_loss=0.0001891, whisper_loss=0.09293, over 3869041.02 frames. ], batch size: 90, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:48:58,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1303980.0, ans=0.1 2024-08-11 21:49:12,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1304080.0, ans=0.125 2024-08-11 21:49:13,455 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.624e+01 2.900e+01 3.333e+01 5.803e+01, threshold=5.799e+01, percent-clipped=0.0 2024-08-11 21:49:17,678 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-11 21:49:22,988 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-11 21:49:31,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1304180.0, ans=0.07 2024-08-11 21:49:32,661 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.86 vs. limit=10.0 2024-08-11 21:50:30,448 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 0, loss[loss=0.113, beats_loss=0.01153, ecapa_loss=0.0002041, whisper_loss=0.09939, over 22602.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01153, ecapa_loss=0.0002041, whisper_loss=0.09939, over 22602.00 frames. ], batch size: 87, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:50:30,448 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 21:51:13,096 INFO [train_multi_KD3.py:1149] (2/4) Epoch 10, validation on ASR_libri: loss=0.2568, beats_loss=0, ecapa_loss=0.0006206, whisper_loss=0.2506, over 922467.00 frames. 2024-08-11 21:51:29,344 INFO [train_multi_KD3.py:1149] (2/4) Epoch 10, validation on SV_voxceleb1: loss=0.005051, beats_loss=0, ecapa_loss=0.0005051, whisper_loss=0, over 939242.00 frames. 2024-08-11 21:53:17,167 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2202, 3.9872, 3.4878, 3.6049], device='cuda:2') 2024-08-11 21:53:33,445 INFO [train_multi_KD3.py:1149] (2/4) Epoch 10, validation on AT_audioset: loss=0.02495, beats_loss=0.02495, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 21:53:33,449 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 21:53:35,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1304320.0, ans=0.0 2024-08-11 21:53:38,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1304320.0, ans=0.0 2024-08-11 21:53:59,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1304420.0, ans=0.125 2024-08-11 21:53:59,550 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.16 vs. limit=6.0 2024-08-11 21:54:23,326 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 32 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-11 21:54:28,446 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-11 21:54:47,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1304520.0, ans=0.125 2024-08-11 21:54:54,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1304620.0, ans=0.2 2024-08-11 21:55:14,712 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 38 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 21:55:14,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1304620.0, ans=0.125 2024-08-11 21:55:17,457 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 21:55:43,135 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 50, loss[loss=0.1162, beats_loss=0.01263, ecapa_loss=0.0001749, whisper_loss=0.1018, over 22826.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01078, ecapa_loss=0.000186, whisper_loss=0.09403, over 890262.65 frames. ], batch size: 93, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:56:00,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1304820.0, ans=0.125 2024-08-11 21:56:10,711 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.47 vs. limit=22.5 2024-08-11 21:56:47,442 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.905e+01 3.307e+01 3.702e+01 5.786e+01, threshold=6.614e+01, percent-clipped=0.0 2024-08-11 21:56:56,716 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 21:57:01,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1305120.0, ans=0.125 2024-08-11 21:57:21,148 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-11 21:57:29,418 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2024-08-11 21:57:29,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1305220.0, ans=22.5 2024-08-11 21:57:37,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1305220.0, ans=0.125 2024-08-11 21:57:41,637 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 100, loss[loss=0.09409, beats_loss=0.007372, ecapa_loss=0.0002405, whisper_loss=0.08431, over 15856.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01064, ecapa_loss=0.0001883, whisper_loss=0.09346, over 1555501.58 frames. ], batch size: 64, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:58:17,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1305420.0, ans=0.0 2024-08-11 21:58:30,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=1305520.0, ans=0.2 2024-08-11 21:58:58,987 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.94 vs. limit=12.0 2024-08-11 21:59:04,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1305620.0, ans=0.125 2024-08-11 21:59:10,142 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.24 vs. limit=10.0 2024-08-11 21:59:23,867 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.37 vs. limit=15.0 2024-08-11 21:59:32,446 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 150, loss[loss=0.0749, beats_loss=0.0119, ecapa_loss=0.0001934, whisper_loss=0.06106, over 15884.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01071, ecapa_loss=0.0001879, whisper_loss=0.09141, over 2063759.10 frames. ], batch size: 66, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:00:07,237 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2024-08-11 22:00:08,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1306020.0, ans=0.125 2024-08-11 22:00:20,120 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.798e+01 3.187e+01 3.633e+01 2.129e+02, threshold=6.375e+01, percent-clipped=1.0 2024-08-11 22:00:26,269 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.26 vs. limit=6.0 2024-08-11 22:00:41,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1306220.0, ans=0.125 2024-08-11 22:00:51,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1306220.0, ans=0.0 2024-08-11 22:00:53,799 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 32 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 22:00:58,141 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 200, loss[loss=0.0982, beats_loss=0.01093, ecapa_loss=0.0001827, whisper_loss=0.08545, over 15191.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01066, ecapa_loss=0.0001887, whisper_loss=0.09152, over 2439559.45 frames. ], batch size: 60, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:00:59,869 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 22:01:17,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1306420.0, ans=0.2 2024-08-11 22:01:28,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1306520.0, ans=0.1 2024-08-11 22:01:58,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1306720.0, ans=0.125 2024-08-11 22:02:02,654 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.19 vs. limit=8.0 2024-08-11 22:02:06,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1306720.0, ans=0.125 2024-08-11 22:02:09,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1306720.0, ans=0.125 2024-08-11 22:02:10,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1306720.0, ans=0.125 2024-08-11 22:02:14,258 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 250, loss[loss=0.09165, beats_loss=0.01078, ecapa_loss=0.0001732, whisper_loss=0.07914, over 18168.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01064, ecapa_loss=0.0001887, whisper_loss=0.09168, over 2729307.28 frames. ], batch size: 71, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:02:18,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1306820.0, ans=0.125 2024-08-11 22:02:32,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1306920.0, ans=0.125 2024-08-11 22:02:41,000 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-11 22:02:57,174 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.454e+01 2.692e+01 3.153e+01 8.296e+01, threshold=5.384e+01, percent-clipped=2.0 2024-08-11 22:02:59,526 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 29 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-11 22:02:59,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1307020.0, ans=0.125 2024-08-11 22:03:10,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1307120.0, ans=0.95 2024-08-11 22:03:16,260 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2024-08-11 22:03:17,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1307220.0, ans=0.125 2024-08-11 22:03:30,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1307320.0, ans=0.0 2024-08-11 22:03:31,177 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 300, loss[loss=0.09053, beats_loss=0.01062, ecapa_loss=0.0002076, whisper_loss=0.07783, over 17825.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01074, ecapa_loss=0.0001886, whisper_loss=0.09207, over 2963076.48 frames. ], batch size: 75, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:04:01,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1307520.0, ans=0.125 2024-08-11 22:04:05,360 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 22:04:11,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1307520.0, ans=0.125 2024-08-11 22:04:24,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1307620.0, ans=0.125 2024-08-11 22:04:29,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1307620.0, ans=0.2 2024-08-11 22:04:46,373 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 350, loss[loss=0.08621, beats_loss=0.01092, ecapa_loss=0.000186, whisper_loss=0.07343, over 21045.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01082, ecapa_loss=0.0001883, whisper_loss=0.0915, over 3153985.38 frames. ], batch size: 82, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:04:57,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1307820.0, ans=0.125 2024-08-11 22:04:59,661 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 22:05:24,505 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.36 vs. limit=22.5 2024-08-11 22:05:26,119 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.513e+01 2.913e+01 3.282e+01 4.748e+01, threshold=5.825e+01, percent-clipped=0.0 2024-08-11 22:05:39,545 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 22:05:41,647 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-11 22:05:54,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1308220.0, ans=0.0 2024-08-11 22:06:00,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1308320.0, ans=0.0 2024-08-11 22:06:01,731 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 400, loss[loss=0.1089, beats_loss=0.007348, ecapa_loss=0.0002267, whisper_loss=0.09926, over 22645.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01082, ecapa_loss=0.0001876, whisper_loss=0.09216, over 3281020.63 frames. ], batch size: 90, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:06:11,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1308320.0, ans=0.125 2024-08-11 22:06:17,993 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.722e-01 2024-08-11 22:06:19,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1308420.0, ans=0.1 2024-08-11 22:06:35,301 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-11 22:06:38,509 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 36 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 22:06:54,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1308620.0, ans=0.1 2024-08-11 22:07:01,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1308720.0, ans=0.2 2024-08-11 22:07:17,657 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=22.5 2024-08-11 22:07:17,930 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 450, loss[loss=0.09994, beats_loss=0.01332, ecapa_loss=0.0001691, whisper_loss=0.08494, over 20800.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.0001876, whisper_loss=0.09156, over 3399310.73 frames. ], batch size: 83, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:07:18,129 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 22:07:58,806 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.488e+01 3.017e+01 3.515e+01 8.522e+01, threshold=6.035e+01, percent-clipped=1.0 2024-08-11 22:08:03,721 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.90 vs. limit=15.0 2024-08-11 22:08:12,081 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-11 22:08:18,634 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-11 22:08:33,623 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 500, loss[loss=0.1333, beats_loss=0.009328, ecapa_loss=0.000227, whisper_loss=0.1217, over 18957.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01089, ecapa_loss=0.0001881, whisper_loss=0.09115, over 3471315.53 frames. ], batch size: 74, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:08:38,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1309320.0, ans=0.125 2024-08-11 22:08:45,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1309320.0, ans=0.125 2024-08-11 22:08:47,025 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-11 22:08:47,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1309420.0, ans=0.0 2024-08-11 22:08:56,404 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 22:09:18,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1309620.0, ans=0.0 2024-08-11 22:09:20,586 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 22:09:21,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1309620.0, ans=0.125 2024-08-11 22:09:37,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1309720.0, ans=0.1 2024-08-11 22:09:38,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1309720.0, ans=0.2 2024-08-11 22:09:40,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1309720.0, ans=0.1 2024-08-11 22:09:47,486 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2024-08-11 22:09:50,827 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 550, loss[loss=0.123, beats_loss=0.006985, ecapa_loss=0.0001797, whisper_loss=0.1142, over 14862.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01092, ecapa_loss=0.0001888, whisper_loss=0.09132, over 3557218.27 frames. ], batch size: 54, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:09:51,030 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-11 22:10:17,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1309920.0, ans=0.125 2024-08-11 22:10:31,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1310020.0, ans=0.125 2024-08-11 22:10:32,005 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.567e+01 3.001e+01 3.540e+01 6.068e+01, threshold=6.003e+01, percent-clipped=1.0 2024-08-11 22:10:32,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1310020.0, ans=0.0 2024-08-11 22:10:35,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1310120.0, ans=0.05 2024-08-11 22:10:38,286 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-11 22:10:40,173 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 22:10:42,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1310120.0, ans=0.05 2024-08-11 22:10:44,020 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.24 vs. limit=22.5 2024-08-11 22:10:58,115 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 18 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 22:11:07,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 600, loss[loss=0.1015, beats_loss=0.0114, ecapa_loss=0.000203, whisper_loss=0.08808, over 16172.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01088, ecapa_loss=0.0001862, whisper_loss=0.09194, over 3618456.52 frames. ], batch size: 63, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:11:08,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1310320.0, ans=0.125 2024-08-11 22:11:11,904 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-11 22:11:25,859 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 22:11:27,896 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.76 vs. limit=6.0 2024-08-11 22:11:58,942 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.94 vs. limit=22.5 2024-08-11 22:11:58,968 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2024-08-11 22:12:00,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1310620.0, ans=0.125 2024-08-11 22:12:12,919 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2024-08-11 22:12:23,994 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 650, loss[loss=0.1172, beats_loss=0.00964, ecapa_loss=0.0001823, whisper_loss=0.1058, over 21415.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01091, ecapa_loss=0.0001848, whisper_loss=0.09237, over 3675579.27 frames. ], batch size: 80, lr: 6.47e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:12:31,059 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.52 vs. limit=22.5 2024-08-11 22:12:35,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1310820.0, ans=0.125 2024-08-11 22:12:43,919 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-11 22:12:53,995 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.17 vs. limit=6.0 2024-08-11 22:13:03,573 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-11 22:13:04,480 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.576e+01 2.785e+01 3.016e+01 3.995e+01, threshold=5.570e+01, percent-clipped=0.0 2024-08-11 22:13:14,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1311120.0, ans=0.125 2024-08-11 22:13:19,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=21.31 vs. limit=15.0 2024-08-11 22:13:32,652 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.52 vs. limit=22.5 2024-08-11 22:13:40,375 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 700, loss[loss=0.1163, beats_loss=0.008289, ecapa_loss=0.0001966, whisper_loss=0.1061, over 22687.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01095, ecapa_loss=0.0001844, whisper_loss=0.09249, over 3728002.25 frames. ], batch size: 87, lr: 6.47e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:13:51,557 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 22:14:10,782 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 22:14:16,481 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 28 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 22:14:38,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1311620.0, ans=0.0 2024-08-11 22:14:42,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1311720.0, ans=0.125 2024-08-11 22:14:44,228 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.93 vs. limit=6.0 2024-08-11 22:14:51,946 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 32 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 22:14:54,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1311720.0, ans=0.0 2024-08-11 22:15:00,345 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 750, loss[loss=0.08527, beats_loss=0.01181, ecapa_loss=0.000154, whisper_loss=0.07192, over 16885.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01099, ecapa_loss=0.000183, whisper_loss=0.09204, over 3738688.32 frames. ], batch size: 65, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:15:00,490 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-11 22:15:20,657 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 22:15:20,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1311920.0, ans=0.0 2024-08-11 22:15:24,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=1311920.0, ans=0.02 2024-08-11 22:15:41,507 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.002e+01 2.583e+01 2.835e+01 3.303e+01 6.155e+01, threshold=5.670e+01, percent-clipped=2.0 2024-08-11 22:16:00,846 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 22:16:17,547 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 800, loss[loss=0.132, beats_loss=0.01039, ecapa_loss=0.0001784, whisper_loss=0.1198, over 21314.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01092, ecapa_loss=0.0001839, whisper_loss=0.09262, over 3745651.33 frames. ], batch size: 83, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:16:18,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1312320.0, ans=0.2 2024-08-11 22:16:42,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1312420.0, ans=0.0 2024-08-11 22:16:52,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1312420.0, ans=0.1 2024-08-11 22:17:05,956 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 22:17:19,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1312620.0, ans=0.125 2024-08-11 22:17:23,955 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 12 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 22:17:27,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1312620.0, ans=0.0 2024-08-11 22:17:31,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1312720.0, ans=0.5 2024-08-11 22:17:52,210 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 850, loss[loss=0.1054, beats_loss=0.01082, ecapa_loss=0.0002473, whisper_loss=0.09215, over 14918.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01095, ecapa_loss=0.0001853, whisper_loss=0.09191, over 3768277.64 frames. ], batch size: 59, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:17:59,997 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.65 vs. limit=22.5 2024-08-11 22:18:04,049 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 22:18:27,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1313020.0, ans=0.0 2024-08-11 22:18:28,819 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 22:18:29,597 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-11 22:18:32,359 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 22:18:41,117 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.539e+01 2.872e+01 3.296e+01 5.215e+01, threshold=5.743e+01, percent-clipped=0.0 2024-08-11 22:18:46,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1313120.0, ans=0.125 2024-08-11 22:18:59,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1313120.0, ans=0.0 2024-08-11 22:19:16,589 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 22:19:25,984 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 900, loss[loss=0.0988, beats_loss=0.01106, ecapa_loss=0.0001877, whisper_loss=0.08586, over 14101.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01094, ecapa_loss=0.0001844, whisper_loss=0.09199, over 3772673.42 frames. ], batch size: 56, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:19:31,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1313320.0, ans=0.0 2024-08-11 22:19:42,108 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 22:19:55,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1313420.0, ans=0.125 2024-08-11 22:19:55,941 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=15.0 2024-08-11 22:20:05,099 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 22:20:23,767 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 22:20:24,170 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2024-08-11 22:21:02,290 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 950, loss[loss=0.081, beats_loss=0.0155, ecapa_loss=0.0001609, whisper_loss=0.06389, over 23242.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01096, ecapa_loss=0.0001831, whisper_loss=0.09159, over 3769429.18 frames. ], batch size: 93, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:21:18,901 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 22:21:19,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1313920.0, ans=0.125 2024-08-11 22:21:35,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1313920.0, ans=0.2 2024-08-11 22:21:39,440 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=15.0 2024-08-11 22:21:45,678 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 22:21:51,533 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.622e+01 2.859e+01 3.329e+01 4.580e+01, threshold=5.718e+01, percent-clipped=0.0 2024-08-11 22:22:13,914 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-08-11 22:22:27,449 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 22:22:30,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1314220.0, ans=0.015 2024-08-11 22:22:30,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1314220.0, ans=0.125 2024-08-11 22:22:31,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1314220.0, ans=0.0 2024-08-11 22:22:33,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1314320.0, ans=0.0 2024-08-11 22:22:34,438 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1000, loss[loss=0.08812, beats_loss=0.0147, ecapa_loss=0.0001831, whisper_loss=0.07158, over 16977.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.011, ecapa_loss=0.0001831, whisper_loss=0.09187, over 3797573.08 frames. ], batch size: 68, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:22:47,999 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 27 from LS+wenet, 23 from Vox, 17 fro AS 2024-08-11 22:22:50,839 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-08-11 22:22:57,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1314420.0, ans=0.0 2024-08-11 22:23:27,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1314620.0, ans=0.05 2024-08-11 22:23:43,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1314720.0, ans=0.0 2024-08-11 22:23:47,285 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-11 22:24:01,297 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1050, loss[loss=0.08257, beats_loss=0.01379, ecapa_loss=0.0001632, whisper_loss=0.06715, over 16703.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01105, ecapa_loss=0.0001821, whisper_loss=0.09178, over 3826254.22 frames. ], batch size: 67, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:24:07,499 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2024-08-11 22:24:25,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1314920.0, ans=0.0 2024-08-11 22:24:32,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1315020.0, ans=0.0 2024-08-11 22:24:38,356 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.024e+01 2.421e+01 2.684e+01 3.099e+01 9.894e+01, threshold=5.368e+01, percent-clipped=2.0 2024-08-11 22:24:45,322 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 22:24:51,945 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 22:25:09,445 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1100, loss[loss=0.1343, beats_loss=0.009927, ecapa_loss=0.0001825, whisper_loss=0.1226, over 21525.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01097, ecapa_loss=0.0001834, whisper_loss=0.09228, over 3807951.82 frames. ], batch size: 83, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:25:16,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1315320.0, ans=0.2 2024-08-11 22:25:19,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1315320.0, ans=0.0 2024-08-11 22:25:20,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1315320.0, ans=0.125 2024-08-11 22:25:26,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1315420.0, ans=0.0 2024-08-11 22:25:40,762 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 22:25:41,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1315520.0, ans=0.125 2024-08-11 22:25:42,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1315520.0, ans=0.09899494936611666 2024-08-11 22:25:57,502 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-11 22:26:00,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1315620.0, ans=0.125 2024-08-11 22:26:09,541 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 12 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-11 22:26:17,879 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1150, loss[loss=0.09472, beats_loss=0.01352, ecapa_loss=0.0001697, whisper_loss=0.07951, over 19404.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.011, ecapa_loss=0.0001829, whisper_loss=0.09202, over 3806961.31 frames. ], batch size: 80, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:26:22,380 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 22:26:36,533 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=12.0 2024-08-11 22:26:42,836 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 22:26:46,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1316020.0, ans=0.0 2024-08-11 22:26:47,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1316020.0, ans=0.125 2024-08-11 22:26:54,931 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.627e+01 2.933e+01 3.282e+01 4.582e+01, threshold=5.866e+01, percent-clipped=0.0 2024-08-11 22:27:06,440 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-11 22:27:14,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1316220.0, ans=10.0 2024-08-11 22:27:26,986 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1200, loss[loss=0.1049, beats_loss=0.01082, ecapa_loss=0.0002213, whisper_loss=0.09186, over 21896.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01103, ecapa_loss=0.0001817, whisper_loss=0.09149, over 3779657.24 frames. ], batch size: 91, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:27:27,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1316320.0, ans=0.1 2024-08-11 22:27:38,330 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2024-08-11 22:27:41,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1316420.0, ans=0.2 2024-08-11 22:27:57,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1316520.0, ans=0.035 2024-08-11 22:28:24,689 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-11 22:28:29,753 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.29 vs. limit=15.0 2024-08-11 22:28:34,680 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 22:28:35,822 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1250, loss[loss=0.0962, beats_loss=0.01359, ecapa_loss=0.0001678, whisper_loss=0.08093, over 18640.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01104, ecapa_loss=0.0001818, whisper_loss=0.09156, over 3761239.47 frames. ], batch size: 78, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:28:36,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1316820.0, ans=0.125 2024-08-11 22:28:39,523 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-11 22:28:40,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1316820.0, ans=0.2 2024-08-11 22:28:42,736 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 22:29:11,882 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.416e+01 2.652e+01 2.971e+01 4.212e+01, threshold=5.305e+01, percent-clipped=0.0 2024-08-11 22:29:16,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1317120.0, ans=0.05 2024-08-11 22:29:26,333 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2024-08-11 22:29:40,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1317220.0, ans=0.125 2024-08-11 22:29:43,169 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1300, loss[loss=0.1054, beats_loss=0.0123, ecapa_loss=0.0001463, whisper_loss=0.09161, over 19807.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01107, ecapa_loss=0.0001809, whisper_loss=0.09126, over 3794610.75 frames. ], batch size: 78, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:29:51,945 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.03 vs. limit=6.0 2024-08-11 22:30:08,474 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-11 22:30:11,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1317520.0, ans=0.5 2024-08-11 22:30:26,135 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 27 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-11 22:30:40,965 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-11 22:30:45,010 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 22:30:51,458 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1350, loss[loss=0.09477, beats_loss=0.01365, ecapa_loss=0.000157, whisper_loss=0.07956, over 21288.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01109, ecapa_loss=0.0001804, whisper_loss=0.09149, over 3821900.05 frames. ], batch size: 86, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:30:55,073 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=15.0 2024-08-11 22:31:00,275 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 22:31:08,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1317920.0, ans=0.125 2024-08-11 22:31:09,803 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 22:31:18,369 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-11 22:31:28,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1318020.0, ans=0.1 2024-08-11 22:31:28,844 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.605e+01 2.891e+01 3.294e+01 5.251e+01, threshold=5.782e+01, percent-clipped=0.0 2024-08-11 22:31:29,153 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 13 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-11 22:31:44,076 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.34 vs. limit=10.0 2024-08-11 22:31:56,548 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 22:31:56,859 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.859e-01 2024-08-11 22:32:00,648 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1400, loss[loss=0.1089, beats_loss=0.01035, ecapa_loss=0.0001733, whisper_loss=0.09683, over 22572.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01097, ecapa_loss=0.0001807, whisper_loss=0.09179, over 3821109.05 frames. ], batch size: 88, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:32:04,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1318320.0, ans=0.1 2024-08-11 22:32:19,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1318420.0, ans=0.125 2024-08-11 22:32:23,387 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 26 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-11 22:32:27,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1318420.0, ans=0.1 2024-08-11 22:32:30,662 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 22:32:45,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1318620.0, ans=0.0 2024-08-11 22:33:04,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1318720.0, ans=0.125 2024-08-11 22:33:11,091 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1450, loss[loss=0.08565, beats_loss=0.01481, ecapa_loss=0.0001557, whisper_loss=0.06929, over 19991.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01101, ecapa_loss=0.0001796, whisper_loss=0.09158, over 3823983.30 frames. ], batch size: 81, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:33:45,704 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 22:34:01,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1318920.0, ans=0.125 2024-08-11 22:34:10,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1319020.0, ans=0.125 2024-08-11 22:34:12,983 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.426e+01 2.679e+01 3.124e+01 8.618e+01, threshold=5.357e+01, percent-clipped=1.0 2024-08-11 22:34:24,669 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 27 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 22:34:31,699 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 38 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 22:34:41,265 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2024-08-11 22:34:42,515 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.03 vs. limit=12.0 2024-08-11 22:34:43,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1319220.0, ans=0.2 2024-08-11 22:34:45,968 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1500, loss[loss=0.09618, beats_loss=0.01049, ecapa_loss=0.0001492, whisper_loss=0.08419, over 15778.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01104, ecapa_loss=0.0001797, whisper_loss=0.09085, over 3820768.84 frames. ], batch size: 58, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:35:26,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1319520.0, ans=0.125 2024-08-11 22:35:28,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1319520.0, ans=0.125 2024-08-11 22:35:32,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1319620.0, ans=0.125 2024-08-11 22:35:37,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1319620.0, ans=0.125 2024-08-11 22:35:39,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1319620.0, ans=0.2 2024-08-11 22:35:51,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1319720.0, ans=0.0 2024-08-11 22:35:54,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1319720.0, ans=0.05 2024-08-11 22:35:57,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1319820.0, ans=0.0 2024-08-11 22:35:57,888 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1550, loss[loss=0.1169, beats_loss=0.009385, ecapa_loss=0.0002125, whisper_loss=0.1054, over 22133.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01109, ecapa_loss=0.0001783, whisper_loss=0.09065, over 3789689.17 frames. ], batch size: 91, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:35:58,521 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 22:35:59,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1319820.0, ans=0.125 2024-08-11 22:36:13,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1319920.0, ans=0.1 2024-08-11 22:36:26,598 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2024-08-11 22:36:28,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1320020.0, ans=0.125 2024-08-11 22:36:29,756 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 22:36:37,882 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.581e+01 2.864e+01 3.252e+01 1.978e+02, threshold=5.728e+01, percent-clipped=3.0 2024-08-11 22:36:46,112 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-11 22:36:58,145 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=12.0 2024-08-11 22:37:06,749 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-11 22:37:09,288 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1600, loss[loss=0.07505, beats_loss=0.01088, ecapa_loss=0.0001861, whisper_loss=0.06232, over 16406.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01099, ecapa_loss=0.0001789, whisper_loss=0.09132, over 3797824.08 frames. ], batch size: 66, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:37:36,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1320520.0, ans=0.0 2024-08-11 22:37:40,150 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 16 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 22:37:42,281 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2024-08-11 22:37:43,346 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.38 vs. limit=15.0 2024-08-11 22:38:03,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1320720.0, ans=0.1 2024-08-11 22:38:17,408 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1650, loss[loss=0.09781, beats_loss=0.01182, ecapa_loss=0.0001679, whisper_loss=0.08431, over 22195.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01098, ecapa_loss=0.000179, whisper_loss=0.09226, over 3844303.81 frames. ], batch size: 89, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:38:31,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1320920.0, ans=0.0 2024-08-11 22:38:33,202 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-11 22:38:52,483 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 22:38:55,185 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.537e+01 2.816e+01 3.147e+01 5.584e+01, threshold=5.632e+01, percent-clipped=0.0 2024-08-11 22:38:55,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1321020.0, ans=0.0 2024-08-11 22:39:08,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1321120.0, ans=0.125 2024-08-11 22:39:27,083 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1700, loss[loss=0.1106, beats_loss=0.0101, ecapa_loss=0.0001826, whisper_loss=0.09862, over 16166.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01098, ecapa_loss=0.0001789, whisper_loss=0.09218, over 3849082.32 frames. ], batch size: 62, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:39:27,359 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-11 22:39:35,427 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 22:39:38,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1321320.0, ans=0.1 2024-08-11 22:39:39,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1321420.0, ans=0.1 2024-08-11 22:39:52,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-11 22:40:05,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.22 vs. limit=15.0 2024-08-11 22:40:19,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1321620.0, ans=0.0 2024-08-11 22:40:32,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1321720.0, ans=0.125 2024-08-11 22:40:35,620 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1750, loss[loss=0.1033, beats_loss=0.01103, ecapa_loss=0.0002016, whisper_loss=0.09021, over 21332.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01101, ecapa_loss=0.0001785, whisper_loss=0.0919, over 3851211.18 frames. ], batch size: 89, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:40:41,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1321820.0, ans=0.2 2024-08-11 22:40:49,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1321920.0, ans=0.125 2024-08-11 22:40:56,075 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 17 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-11 22:41:13,000 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.558e+01 2.941e+01 3.375e+01 5.382e+01, threshold=5.883e+01, percent-clipped=0.0 2024-08-11 22:41:45,283 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1800, loss[loss=0.1086, beats_loss=0.009516, ecapa_loss=0.0001728, whisper_loss=0.09739, over 23305.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01092, ecapa_loss=0.0001792, whisper_loss=0.09238, over 3855409.45 frames. ], batch size: 92, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:42:02,384 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 22:42:19,910 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.15 vs. limit=8.0 2024-08-11 22:42:20,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1322520.0, ans=0.125 2024-08-11 22:42:44,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1322720.0, ans=0.1 2024-08-11 22:42:44,700 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.40 vs. limit=15.0 2024-08-11 22:42:55,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1322720.0, ans=0.1 2024-08-11 22:42:58,183 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1850, loss[loss=0.1072, beats_loss=0.0114, ecapa_loss=0.0001707, whisper_loss=0.09413, over 23587.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01093, ecapa_loss=0.0001788, whisper_loss=0.09247, over 3850785.58 frames. ], batch size: 91, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:43:16,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1322920.0, ans=0.0 2024-08-11 22:43:19,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1322920.0, ans=0.125 2024-08-11 22:43:39,855 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.593e+01 2.917e+01 3.347e+01 7.328e+01, threshold=5.834e+01, percent-clipped=1.0 2024-08-11 22:43:54,717 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-11 22:43:58,824 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-11 22:44:07,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1323220.0, ans=0.125 2024-08-11 22:44:12,219 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1900, loss[loss=0.1053, beats_loss=0.01387, ecapa_loss=0.0001694, whisper_loss=0.08971, over 22299.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01105, ecapa_loss=0.0001793, whisper_loss=0.09219, over 3872168.67 frames. ], batch size: 87, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:44:16,727 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.95 vs. limit=6.0 2024-08-11 22:44:18,722 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 22:44:25,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1323320.0, ans=0.0 2024-08-11 22:44:27,225 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 22:44:49,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1323520.0, ans=0.125 2024-08-11 22:45:01,749 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 22:45:06,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1323620.0, ans=0.0 2024-08-11 22:45:20,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1323720.0, ans=0.2 2024-08-11 22:45:21,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1323720.0, ans=0.1 2024-08-11 22:45:24,453 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 1950, loss[loss=0.1027, beats_loss=0.01183, ecapa_loss=0.0002114, whisper_loss=0.08878, over 17771.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01101, ecapa_loss=0.0001818, whisper_loss=0.09244, over 3837935.62 frames. ], batch size: 73, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:45:32,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1323820.0, ans=0.125 2024-08-11 22:45:38,115 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 22:45:45,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1323920.0, ans=0.125 2024-08-11 22:45:46,265 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.39 vs. limit=15.0 2024-08-11 22:46:02,125 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.530e+01 2.923e+01 3.581e+01 1.963e+02, threshold=5.846e+01, percent-clipped=3.0 2024-08-11 22:46:34,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1324220.0, ans=0.0 2024-08-11 22:46:35,450 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-11 22:46:35,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1324320.0, ans=0.125 2024-08-11 22:46:36,553 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2000, loss[loss=0.09192, beats_loss=0.01332, ecapa_loss=0.0001834, whisper_loss=0.07677, over 22591.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01095, ecapa_loss=0.000183, whisper_loss=0.09288, over 3852658.49 frames. ], batch size: 94, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:46:37,432 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.03 vs. limit=10.0 2024-08-11 22:46:38,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1324320.0, ans=0.2 2024-08-11 22:46:45,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1324320.0, ans=0.125 2024-08-11 22:46:46,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1324320.0, ans=0.1 2024-08-11 22:46:49,538 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.13 vs. limit=15.0 2024-08-11 22:46:58,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1324420.0, ans=0.0 2024-08-11 22:46:59,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1324420.0, ans=0.2 2024-08-11 22:47:45,461 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 22:47:51,929 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2050, loss[loss=0.1298, beats_loss=0.00889, ecapa_loss=0.0001902, whisper_loss=0.119, over 23848.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01105, ecapa_loss=0.0001839, whisper_loss=0.0917, over 3876193.22 frames. ], batch size: 95, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:48:20,927 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 22:48:30,714 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.631e+01 3.014e+01 3.370e+01 4.766e+01, threshold=6.027e+01, percent-clipped=0.0 2024-08-11 22:48:31,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1325020.0, ans=0.1 2024-08-11 22:48:48,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.41 vs. limit=15.0 2024-08-11 22:49:03,221 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2100, loss[loss=0.09701, beats_loss=0.01146, ecapa_loss=0.0001887, whisper_loss=0.08366, over 22686.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01119, ecapa_loss=0.0001821, whisper_loss=0.09137, over 3844594.65 frames. ], batch size: 93, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:49:05,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1325320.0, ans=0.125 2024-08-11 22:49:08,542 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 22:49:25,276 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.841e-01 2024-08-11 22:49:34,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1325520.0, ans=0.0 2024-08-11 22:49:35,259 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.936e+05 2024-08-11 22:49:48,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1325620.0, ans=0.2 2024-08-11 22:49:59,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1325620.0, ans=0.125 2024-08-11 22:50:00,648 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 22:50:08,846 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.22 vs. limit=10.0 2024-08-11 22:50:18,669 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2150, loss[loss=0.09005, beats_loss=0.01221, ecapa_loss=0.0001945, whisper_loss=0.0759, over 12944.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01109, ecapa_loss=0.0001836, whisper_loss=0.09272, over 3862622.02 frames. ], batch size: 54, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:50:24,862 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 32 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 22:50:31,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1325920.0, ans=0.0 2024-08-11 22:50:40,309 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 22:50:45,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1325920.0, ans=0.125 2024-08-11 22:50:51,159 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-11 22:50:57,003 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.560e+01 2.828e+01 3.267e+01 5.795e+01, threshold=5.656e+01, percent-clipped=0.0 2024-08-11 22:51:05,247 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 13 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 22:51:06,161 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2024-08-11 22:51:20,805 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.38 vs. limit=15.0 2024-08-11 22:51:31,049 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2200, loss[loss=0.08067, beats_loss=0.01152, ecapa_loss=0.0001964, whisper_loss=0.06719, over 18546.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01108, ecapa_loss=0.0001839, whisper_loss=0.09286, over 3822847.70 frames. ], batch size: 79, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:51:34,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1326320.0, ans=0.2 2024-08-11 22:51:51,871 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.64 vs. limit=15.0 2024-08-11 22:51:55,841 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 22:52:01,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1326520.0, ans=0.125 2024-08-11 22:52:11,202 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 22:52:11,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1326520.0, ans=0.0 2024-08-11 22:52:23,916 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.93 vs. limit=22.5 2024-08-11 22:52:27,882 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-11 22:52:36,736 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 22:52:43,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1326820.0, ans=0.2 2024-08-11 22:52:44,234 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2250, loss[loss=0.09549, beats_loss=0.01205, ecapa_loss=0.0001831, whisper_loss=0.08161, over 20251.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01111, ecapa_loss=0.000184, whisper_loss=0.09331, over 3834004.47 frames. ], batch size: 81, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:52:56,145 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2024-08-11 22:52:59,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1326920.0, ans=0.07 2024-08-11 22:53:07,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1326920.0, ans=0.95 2024-08-11 22:53:18,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-08-11 22:53:24,638 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.654e+01 2.933e+01 3.292e+01 6.746e+01, threshold=5.867e+01, percent-clipped=1.0 2024-08-11 22:53:28,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1327120.0, ans=0.2 2024-08-11 22:53:58,119 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2300, loss[loss=0.07489, beats_loss=0.01176, ecapa_loss=0.0002144, whisper_loss=0.06099, over 13214.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01122, ecapa_loss=0.0001843, whisper_loss=0.09283, over 3860246.19 frames. ], batch size: 54, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:53:59,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1327320.0, ans=0.2 2024-08-11 22:54:04,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1327320.0, ans=0.0 2024-08-11 22:54:08,047 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 28 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 22:54:39,910 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-11 22:54:43,904 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2024-08-11 22:55:09,447 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 31 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 22:55:12,260 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2350, loss[loss=0.08933, beats_loss=0.009501, ecapa_loss=0.0002269, whisper_loss=0.07756, over 18338.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01123, ecapa_loss=0.0001845, whisper_loss=0.09285, over 3896793.98 frames. ], batch size: 76, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:55:16,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=1327820.0, ans=0.1 2024-08-11 22:55:27,563 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 22:55:33,557 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 17 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 22:55:52,517 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.549e+01 2.872e+01 3.307e+01 6.850e+01, threshold=5.744e+01, percent-clipped=1.0 2024-08-11 22:56:04,353 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-11 22:56:24,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1328320.0, ans=0.125 2024-08-11 22:56:25,746 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2400, loss[loss=0.1113, beats_loss=0.00996, ecapa_loss=0.0001802, whisper_loss=0.09958, over 16045.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0112, ecapa_loss=0.0001864, whisper_loss=0.09265, over 3904426.42 frames. ], batch size: 58, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:56:27,988 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.26 vs. limit=22.5 2024-08-11 22:56:31,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1328320.0, ans=0.0 2024-08-11 22:56:38,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1328420.0, ans=0.07 2024-08-11 22:56:41,384 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-11 22:56:55,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1328520.0, ans=0.125 2024-08-11 22:56:59,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1328520.0, ans=0.5 2024-08-11 22:57:10,814 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 17 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 22:57:32,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1328720.0, ans=0.125 2024-08-11 22:57:37,480 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-11 22:57:39,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1328720.0, ans=0.0 2024-08-11 22:57:41,975 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2450, loss[loss=0.06909, beats_loss=0.01409, ecapa_loss=0.0001877, whisper_loss=0.05312, over 19049.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01119, ecapa_loss=0.0001867, whisper_loss=0.09262, over 3924357.94 frames. ], batch size: 81, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:57:48,213 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 35 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-11 22:57:54,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1328820.0, ans=0.125 2024-08-11 22:58:07,314 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-11 22:58:10,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1329020.0, ans=0.125 2024-08-11 22:58:14,631 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 22:58:21,541 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.487e+01 2.806e+01 3.226e+01 5.199e+01, threshold=5.611e+01, percent-clipped=0.0 2024-08-11 22:58:30,513 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 22:58:42,750 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 22:58:47,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1329220.0, ans=0.1 2024-08-11 22:58:49,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.90 vs. limit=15.0 2024-08-11 22:58:56,091 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2500, loss[loss=0.1091, beats_loss=0.0109, ecapa_loss=0.0001995, whisper_loss=0.09616, over 19385.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01111, ecapa_loss=0.0001861, whisper_loss=0.09308, over 3916870.57 frames. ], batch size: 79, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:59:02,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1329320.0, ans=0.125 2024-08-11 22:59:06,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1329320.0, ans=0.0 2024-08-11 22:59:08,717 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 22:59:38,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1329520.0, ans=0.125 2024-08-11 22:59:45,994 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 22:59:49,008 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 22:59:52,570 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2024-08-11 23:00:10,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1329720.0, ans=0.1 2024-08-11 23:00:14,383 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2550, loss[loss=0.1175, beats_loss=0.009367, ecapa_loss=0.0002037, whisper_loss=0.106, over 16028.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01108, ecapa_loss=0.0001856, whisper_loss=0.09334, over 3915034.60 frames. ], batch size: 66, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:00:19,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1329820.0, ans=0.1 2024-08-11 23:00:34,429 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 21 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 23:00:38,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-08-11 23:00:50,524 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 23:00:54,999 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.72 vs. limit=15.0 2024-08-11 23:00:57,047 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.600e+01 2.873e+01 3.308e+01 4.841e+01, threshold=5.745e+01, percent-clipped=0.0 2024-08-11 23:00:59,000 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 24 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-11 23:01:01,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1330120.0, ans=0.125 2024-08-11 23:01:07,272 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 23:01:19,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1330220.0, ans=0.125 2024-08-11 23:01:24,011 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 24 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 23:01:28,545 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 23:01:34,696 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2600, loss[loss=0.1102, beats_loss=0.009847, ecapa_loss=0.0002307, whisper_loss=0.09808, over 17326.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01102, ecapa_loss=0.0001861, whisper_loss=0.09421, over 3892352.90 frames. ], batch size: 68, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:01:39,541 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 29 from Vox, 22 fro AS 2024-08-11 23:01:40,132 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.94 vs. limit=15.0 2024-08-11 23:01:43,654 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-11 23:01:45,487 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-11 23:01:47,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1330320.0, ans=0.125 2024-08-11 23:01:59,395 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 23:02:03,873 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 18 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 23:02:12,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1330520.0, ans=0.125 2024-08-11 23:02:12,482 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.76 vs. limit=22.5 2024-08-11 23:02:24,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1330620.0, ans=0.125 2024-08-11 23:02:24,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1330620.0, ans=0.0 2024-08-11 23:02:29,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1330620.0, ans=0.125 2024-08-11 23:02:33,666 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 23:02:37,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1330720.0, ans=0.1 2024-08-11 23:02:51,792 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2650, loss[loss=0.1161, beats_loss=0.01042, ecapa_loss=0.0001701, whisper_loss=0.104, over 23069.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01102, ecapa_loss=0.0001863, whisper_loss=0.09385, over 3895265.13 frames. ], batch size: 91, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:03:00,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1330820.0, ans=0.0 2024-08-11 23:03:08,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1330920.0, ans=0.2 2024-08-11 23:03:18,124 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 40 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 23:03:35,220 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.699e+01 2.993e+01 3.555e+01 9.155e+01, threshold=5.987e+01, percent-clipped=1.0 2024-08-11 23:03:46,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1331120.0, ans=0.1 2024-08-11 23:04:05,668 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 32 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-11 23:04:13,086 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2700, loss[loss=0.1008, beats_loss=0.01089, ecapa_loss=0.0001784, whisper_loss=0.08816, over 16801.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01113, ecapa_loss=0.0001856, whisper_loss=0.09319, over 3913325.12 frames. ], batch size: 63, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:04:17,872 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.57 vs. limit=15.0 2024-08-11 23:04:24,503 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 23:04:41,837 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 23:04:48,345 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 23:05:07,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1331620.0, ans=0.0 2024-08-11 23:05:15,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1331720.0, ans=0.0 2024-08-11 23:05:17,246 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-08-11 23:05:32,611 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2750, loss[loss=0.1274, beats_loss=0.008733, ecapa_loss=0.0001761, whisper_loss=0.117, over 15724.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01108, ecapa_loss=0.0001847, whisper_loss=0.09346, over 3917300.80 frames. ], batch size: 58, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:05:36,474 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 23:05:56,341 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 23:06:03,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1331920.0, ans=0.125 2024-08-11 23:06:04,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1332020.0, ans=0.0 2024-08-11 23:06:09,844 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.33 vs. limit=22.5 2024-08-11 23:06:18,544 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 2.646e+01 2.996e+01 3.308e+01 5.705e+01, threshold=5.992e+01, percent-clipped=0.0 2024-08-11 23:06:23,237 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 23:06:30,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1332120.0, ans=0.1 2024-08-11 23:06:46,940 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 29 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 23:06:54,840 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2800, loss[loss=0.1201, beats_loss=0.01154, ecapa_loss=0.0001815, whisper_loss=0.1068, over 23553.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01109, ecapa_loss=0.0001858, whisper_loss=0.09344, over 3883248.98 frames. ], batch size: 93, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:07:19,600 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-11 23:07:20,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1332420.0, ans=0.05 2024-08-11 23:07:36,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1332520.0, ans=0.0 2024-08-11 23:07:44,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1332620.0, ans=0.0 2024-08-11 23:07:46,591 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 23:07:50,494 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-11 23:07:51,566 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.610e-02 2024-08-11 23:08:08,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1332720.0, ans=0.125 2024-08-11 23:08:12,997 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2850, loss[loss=0.1026, beats_loss=0.01124, ecapa_loss=0.0001989, whisper_loss=0.08932, over 22844.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01106, ecapa_loss=0.0001859, whisper_loss=0.09367, over 3885294.72 frames. ], batch size: 94, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:08:19,559 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2024-08-11 23:08:34,715 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 23:08:43,262 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 23:08:48,505 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 31 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 23:08:50,729 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.36 vs. limit=15.0 2024-08-11 23:08:52,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1333020.0, ans=0.1 2024-08-11 23:08:53,792 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 23:08:56,470 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 23:08:57,891 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.704e+01 2.991e+01 3.400e+01 6.217e+01, threshold=5.982e+01, percent-clipped=1.0 2024-08-11 23:09:03,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1333120.0, ans=0.2 2024-08-11 23:09:13,076 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 23:09:14,521 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 29 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 23:09:33,852 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2900, loss[loss=0.09299, beats_loss=0.01043, ecapa_loss=0.0002049, whisper_loss=0.08051, over 22086.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01109, ecapa_loss=0.0001862, whisper_loss=0.09314, over 3910984.49 frames. ], batch size: 93, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:10:06,172 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 23:10:16,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1333520.0, ans=0.0 2024-08-11 23:10:25,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1333620.0, ans=0.0 2024-08-11 23:10:31,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1333620.0, ans=0.0 2024-08-11 23:10:39,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1333720.0, ans=0.0 2024-08-11 23:10:41,940 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 23:10:48,851 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.04 vs. limit=6.0 2024-08-11 23:10:54,008 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 2950, loss[loss=0.1006, beats_loss=0.01221, ecapa_loss=0.000166, whisper_loss=0.08677, over 15432.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0111, ecapa_loss=0.0001854, whisper_loss=0.09296, over 3891610.44 frames. ], batch size: 58, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:10:54,461 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 23:10:56,171 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.11 vs. limit=15.0 2024-08-11 23:11:05,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1333820.0, ans=0.125 2024-08-11 23:11:05,696 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=15.0 2024-08-11 23:11:10,327 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.31 vs. limit=12.0 2024-08-11 23:11:17,779 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 23:11:31,473 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2024-08-11 23:11:32,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1334020.0, ans=0.07 2024-08-11 23:11:34,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1334020.0, ans=0.125 2024-08-11 23:11:39,666 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 23:11:40,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1334020.0, ans=15.0 2024-08-11 23:11:40,762 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.642e+01 2.977e+01 3.342e+01 4.548e+01, threshold=5.953e+01, percent-clipped=0.0 2024-08-11 23:11:57,320 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-11 23:12:05,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1334220.0, ans=0.125 2024-08-11 23:12:14,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1334320.0, ans=0.125 2024-08-11 23:12:15,914 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3000, loss[loss=0.1078, beats_loss=0.01136, ecapa_loss=0.0001822, whisper_loss=0.09463, over 22253.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01109, ecapa_loss=0.0001853, whisper_loss=0.0934, over 3897359.20 frames. ], batch size: 90, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:12:15,914 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-11 23:12:58,439 INFO [train_multi_KD3.py:1149] (2/4) Epoch 10, validation on ASR_libri: loss=0.2567, beats_loss=0, ecapa_loss=0.0006225, whisper_loss=0.2505, over 922467.00 frames. 2024-08-11 23:13:14,959 INFO [train_multi_KD3.py:1149] (2/4) Epoch 10, validation on SV_voxceleb1: loss=0.004936, beats_loss=0, ecapa_loss=0.0004936, whisper_loss=0, over 939242.00 frames. 2024-08-11 23:15:19,843 INFO [train_multi_KD3.py:1149] (2/4) Epoch 10, validation on AT_audioset: loss=0.02462, beats_loss=0.02462, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 23:15:19,849 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-11 23:15:48,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1334420.0, ans=0.2 2024-08-11 23:16:08,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1334620.0, ans=0.125 2024-08-11 23:16:12,732 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=15.0 2024-08-11 23:16:35,425 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0 2024-08-11 23:16:39,421 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3050, loss[loss=0.1426, beats_loss=0.007545, ecapa_loss=0.0001503, whisper_loss=0.1336, over 15713.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01096, ecapa_loss=0.0001859, whisper_loss=0.09451, over 3886151.32 frames. ], batch size: 56, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:16:50,046 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=15.0 2024-08-11 23:17:13,077 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 23:17:14,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1335020.0, ans=0.2 2024-08-11 23:17:19,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1335020.0, ans=0.0 2024-08-11 23:17:21,469 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.698e+01 2.929e+01 3.403e+01 4.861e+01, threshold=5.858e+01, percent-clipped=0.0 2024-08-11 23:17:22,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1335020.0, ans=0.1 2024-08-11 23:17:23,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1335120.0, ans=0.1 2024-08-11 23:17:36,012 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=12.0 2024-08-11 23:17:36,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1335120.0, ans=0.2 2024-08-11 23:17:52,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1335320.0, ans=0.1 2024-08-11 23:17:53,714 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3100, loss[loss=0.1157, beats_loss=0.01175, ecapa_loss=0.0001686, whisper_loss=0.1022, over 22942.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01094, ecapa_loss=0.0001861, whisper_loss=0.09515, over 3887477.07 frames. ], batch size: 91, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:17:59,576 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 23:18:06,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1335320.0, ans=0.125 2024-08-11 23:18:17,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1335420.0, ans=0.04949747468305833 2024-08-11 23:18:21,901 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 23:18:32,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1335520.0, ans=0.125 2024-08-11 23:18:39,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1335620.0, ans=0.0 2024-08-11 23:18:39,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1335620.0, ans=0.2 2024-08-11 23:18:57,823 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2024-08-11 23:19:01,193 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 23:19:01,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1335720.0, ans=0.125 2024-08-11 23:19:10,235 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3150, loss[loss=0.1143, beats_loss=0.01123, ecapa_loss=0.0002352, whisper_loss=0.1007, over 21665.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01111, ecapa_loss=0.0001853, whisper_loss=0.0943, over 3892270.72 frames. ], batch size: 94, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:19:38,616 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 23 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-11 23:19:49,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1336020.0, ans=0.2 2024-08-11 23:19:52,676 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.066e+01 2.477e+01 2.769e+01 3.279e+01 6.467e+01, threshold=5.538e+01, percent-clipped=1.0 2024-08-11 23:20:00,281 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 23:20:07,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1336120.0, ans=0.125 2024-08-11 23:20:07,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1336120.0, ans=0.125 2024-08-11 23:20:09,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1336220.0, ans=0.125 2024-08-11 23:20:24,562 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3200, loss[loss=0.1104, beats_loss=0.01141, ecapa_loss=0.0002243, whisper_loss=0.09676, over 20787.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01106, ecapa_loss=0.000185, whisper_loss=0.09441, over 3888561.53 frames. ], batch size: 89, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:20:26,774 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-11 23:20:27,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1336320.0, ans=0.1 2024-08-11 23:20:32,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1336320.0, ans=0.125 2024-08-11 23:20:38,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1336420.0, ans=0.125 2024-08-11 23:20:39,391 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 21 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-11 23:20:50,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1336420.0, ans=0.0 2024-08-11 23:20:55,692 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-11 23:21:04,060 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 23:21:06,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1336520.0, ans=0.0 2024-08-11 23:21:14,732 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 23:21:16,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1336620.0, ans=0.0 2024-08-11 23:21:23,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1336720.0, ans=0.0 2024-08-11 23:21:23,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1336720.0, ans=0.0 2024-08-11 23:21:37,103 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3250, loss[loss=0.09196, beats_loss=0.01354, ecapa_loss=0.0001987, whisper_loss=0.07643, over 18371.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01106, ecapa_loss=0.0001856, whisper_loss=0.0942, over 3876036.61 frames. ], batch size: 79, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:21:43,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1336820.0, ans=0.04949747468305833 2024-08-11 23:21:44,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1336820.0, ans=0.125 2024-08-11 23:22:18,944 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.451e+01 2.863e+01 3.216e+01 4.803e+01, threshold=5.726e+01, percent-clipped=0.0 2024-08-11 23:22:19,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1337020.0, ans=0.1 2024-08-11 23:22:31,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1337120.0, ans=0.125 2024-08-11 23:22:52,289 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3300, loss[loss=0.1201, beats_loss=0.01158, ecapa_loss=0.0001633, whisper_loss=0.1069, over 18076.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01107, ecapa_loss=0.0001844, whisper_loss=0.09405, over 3875451.78 frames. ], batch size: 71, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:23:24,611 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 23:23:31,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2024-08-11 23:23:36,215 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 23:23:39,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1337620.0, ans=0.125 2024-08-11 23:23:46,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1337620.0, ans=0.125 2024-08-11 23:23:57,371 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-11 23:24:00,138 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.17 vs. limit=22.5 2024-08-11 23:24:01,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1337720.0, ans=0.025 2024-08-11 23:24:06,630 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3350, loss[loss=0.1093, beats_loss=0.0115, ecapa_loss=0.0001795, whisper_loss=0.09602, over 17848.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01109, ecapa_loss=0.0001854, whisper_loss=0.09312, over 3861959.74 frames. ], batch size: 69, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:24:14,093 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 23:24:28,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1337920.0, ans=0.1 2024-08-11 23:24:32,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1337920.0, ans=0.125 2024-08-11 23:24:45,857 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.630e+01 2.898e+01 3.415e+01 6.649e+01, threshold=5.796e+01, percent-clipped=1.0 2024-08-11 23:24:58,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1338120.0, ans=0.125 2024-08-11 23:25:02,566 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.400e-01 2024-08-11 23:25:12,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1338220.0, ans=0.125 2024-08-11 23:25:17,461 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3400, loss[loss=0.1276, beats_loss=0.008869, ecapa_loss=0.0001903, whisper_loss=0.1168, over 14694.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01107, ecapa_loss=0.0001859, whisper_loss=0.09351, over 3911523.36 frames. ], batch size: 55, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:25:18,488 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.37 vs. limit=22.5 2024-08-11 23:25:24,339 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 23:25:30,439 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.12 vs. limit=22.5 2024-08-11 23:25:35,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1338420.0, ans=0.125 2024-08-11 23:25:41,822 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 23:25:46,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1338520.0, ans=0.1 2024-08-11 23:25:53,003 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 23:26:08,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1338620.0, ans=0.05 2024-08-11 23:26:14,808 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2024-08-11 23:26:16,881 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 19 from LS+wenet, 29 from Vox, 44 fro AS 2024-08-11 23:26:18,954 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2024-08-11 23:26:27,320 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3450, loss[loss=0.098, beats_loss=0.01025, ecapa_loss=0.0002081, whisper_loss=0.08567, over 22248.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01108, ecapa_loss=0.0001868, whisper_loss=0.09337, over 3903936.34 frames. ], batch size: 93, lr: 6.41e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:26:32,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1338820.0, ans=0.125 2024-08-11 23:27:05,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1339020.0, ans=0.2 2024-08-11 23:27:08,689 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.641e+01 2.848e+01 3.378e+01 1.355e+02, threshold=5.696e+01, percent-clipped=1.0 2024-08-11 23:27:16,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1339120.0, ans=0.1 2024-08-11 23:27:17,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1339120.0, ans=0.1 2024-08-11 23:27:26,074 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-11 23:27:31,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1339220.0, ans=0.2 2024-08-11 23:27:38,889 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3500, loss[loss=0.1092, beats_loss=0.01027, ecapa_loss=0.0002044, whisper_loss=0.09689, over 22498.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01106, ecapa_loss=0.0001886, whisper_loss=0.0936, over 3898307.11 frames. ], batch size: 91, lr: 6.41e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:27:40,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1339320.0, ans=0.125 2024-08-11 23:27:47,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1339320.0, ans=0.025 2024-08-11 23:27:47,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1339320.0, ans=0.0 2024-08-11 23:28:02,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1339420.0, ans=0.2 2024-08-11 23:28:05,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1339420.0, ans=0.07 2024-08-11 23:28:15,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1339520.0, ans=0.0 2024-08-11 23:28:18,154 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 23:28:19,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1339520.0, ans=0.125 2024-08-11 23:28:24,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1339620.0, ans=0.1 2024-08-11 23:28:35,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1339720.0, ans=0.0 2024-08-11 23:28:43,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1339720.0, ans=0.0 2024-08-11 23:28:46,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1339720.0, ans=0.0 2024-08-11 23:28:50,356 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3550, loss[loss=0.1191, beats_loss=0.01006, ecapa_loss=0.0002017, whisper_loss=0.107, over 17237.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01109, ecapa_loss=0.0001875, whisper_loss=0.0932, over 3894717.29 frames. ], batch size: 71, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:28:52,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1339820.0, ans=0.2 2024-08-11 23:29:04,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1339920.0, ans=0.125 2024-08-11 23:29:10,828 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2024-08-11 23:29:21,944 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 23:29:31,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1340020.0, ans=0.125 2024-08-11 23:29:32,284 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.587e+01 2.900e+01 3.239e+01 4.496e+01, threshold=5.800e+01, percent-clipped=0.0 2024-08-11 23:30:04,484 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 23:30:06,441 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3600, loss[loss=0.1004, beats_loss=0.009079, ecapa_loss=0.0001848, whisper_loss=0.08943, over 16820.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01116, ecapa_loss=0.0001862, whisper_loss=0.09262, over 3848331.67 frames. ], batch size: 64, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:30:12,369 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2024-08-11 23:30:17,358 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 23:30:32,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1340420.0, ans=0.1 2024-08-11 23:30:34,080 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.21 vs. limit=15.0 2024-08-11 23:30:38,308 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.57 vs. limit=15.0 2024-08-11 23:30:41,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=15.0 2024-08-11 23:30:42,156 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 23:30:48,844 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=8.789e-02 2024-08-11 23:30:56,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1340620.0, ans=0.125 2024-08-11 23:31:01,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1340620.0, ans=0.125 2024-08-11 23:31:23,302 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3650, loss[loss=0.08819, beats_loss=0.01303, ecapa_loss=0.000173, whisper_loss=0.07343, over 14411.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01115, ecapa_loss=0.000186, whisper_loss=0.09291, over 3808244.28 frames. ], batch size: 58, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:31:25,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1340820.0, ans=0.0 2024-08-11 23:31:43,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1340920.0, ans=0.125 2024-08-11 23:31:58,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1341020.0, ans=0.2 2024-08-11 23:32:00,198 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.74 vs. limit=15.0 2024-08-11 23:32:08,367 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.620e+01 2.825e+01 3.170e+01 6.141e+01, threshold=5.649e+01, percent-clipped=1.0 2024-08-11 23:32:08,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1341120.0, ans=0.125 2024-08-11 23:32:13,572 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 23:32:15,039 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 23:32:19,224 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=12.0 2024-08-11 23:32:27,545 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.39 vs. limit=15.0 2024-08-11 23:32:41,469 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3700, loss[loss=0.1008, beats_loss=0.01175, ecapa_loss=0.0001938, whisper_loss=0.08714, over 22182.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01113, ecapa_loss=0.0001868, whisper_loss=0.09318, over 3828959.56 frames. ], batch size: 89, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:32:47,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1341320.0, ans=0.125 2024-08-11 23:33:13,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1341520.0, ans=0.125 2024-08-11 23:33:19,248 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.69 vs. limit=12.0 2024-08-11 23:33:57,673 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3750, loss[loss=0.1008, beats_loss=0.01406, ecapa_loss=0.0001336, whisper_loss=0.08541, over 18995.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01121, ecapa_loss=0.0001853, whisper_loss=0.09302, over 3822635.90 frames. ], batch size: 76, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:34:00,596 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 23:34:09,816 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 23:34:14,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1341920.0, ans=0.125 2024-08-11 23:34:30,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1342020.0, ans=0.125 2024-08-11 23:34:34,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1342020.0, ans=0.0 2024-08-11 23:34:41,317 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.517e+01 2.756e+01 3.054e+01 4.813e+01, threshold=5.513e+01, percent-clipped=0.0 2024-08-11 23:34:43,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1342120.0, ans=0.125 2024-08-11 23:34:49,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1342120.0, ans=0.2 2024-08-11 23:34:52,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1342120.0, ans=0.05 2024-08-11 23:35:04,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1342220.0, ans=0.0 2024-08-11 23:35:11,611 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3800, loss[loss=0.1161, beats_loss=0.01263, ecapa_loss=0.0001812, whisper_loss=0.1016, over 21823.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01122, ecapa_loss=0.0001861, whisper_loss=0.09298, over 3833229.43 frames. ], batch size: 88, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:35:27,958 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=19.60 vs. limit=15.0 2024-08-11 23:35:37,962 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 23:35:56,916 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-11 23:36:25,144 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3850, loss[loss=0.09125, beats_loss=0.0114, ecapa_loss=0.000234, whisper_loss=0.07751, over 18363.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0112, ecapa_loss=0.0001852, whisper_loss=0.09267, over 3831475.90 frames. ], batch size: 78, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:36:44,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1342920.0, ans=0.125 2024-08-11 23:36:52,219 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-11 23:36:52,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1343020.0, ans=0.125 2024-08-11 23:37:00,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1343020.0, ans=0.0 2024-08-11 23:37:04,354 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.584e+01 2.893e+01 3.554e+01 5.203e+01, threshold=5.787e+01, percent-clipped=0.0 2024-08-11 23:37:09,007 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 23:37:17,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1343120.0, ans=0.0 2024-08-11 23:37:18,274 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 23:37:22,468 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-11 23:37:32,652 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.82 vs. limit=12.0 2024-08-11 23:37:33,257 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3900, loss[loss=0.1123, beats_loss=0.01262, ecapa_loss=0.0001978, whisper_loss=0.09773, over 21595.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01112, ecapa_loss=0.0001873, whisper_loss=0.09344, over 3830939.21 frames. ], batch size: 89, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:37:33,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1343320.0, ans=0.1 2024-08-11 23:37:55,692 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 23:38:10,776 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 14 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 23:38:12,070 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 23:38:13,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1343620.0, ans=0.0 2024-08-11 23:38:37,543 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.659e-01 2024-08-11 23:38:38,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1343720.0, ans=0.1 2024-08-11 23:38:41,190 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 23:38:42,244 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 3950, loss[loss=0.1057, beats_loss=0.0115, ecapa_loss=0.0001878, whisper_loss=0.09233, over 22865.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01102, ecapa_loss=0.0001882, whisper_loss=0.09444, over 3869613.25 frames. ], batch size: 92, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:39:03,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1343920.0, ans=0.1 2024-08-11 23:39:08,849 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2024-08-11 23:39:11,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1344020.0, ans=0.0 2024-08-11 23:39:22,167 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.690e+01 2.958e+01 3.481e+01 5.578e+01, threshold=5.915e+01, percent-clipped=0.0 2024-08-11 23:39:25,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1344120.0, ans=0.1 2024-08-11 23:39:43,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1344220.0, ans=0.125 2024-08-11 23:39:51,059 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4000, loss[loss=0.09318, beats_loss=0.01285, ecapa_loss=0.0001591, whisper_loss=0.07874, over 21772.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01104, ecapa_loss=0.0001886, whisper_loss=0.09421, over 3870779.41 frames. ], batch size: 88, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:39:52,737 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 23:39:58,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1344320.0, ans=0.125 2024-08-11 23:40:00,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1344320.0, ans=0.0 2024-08-11 23:40:02,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1344320.0, ans=0.1 2024-08-11 23:40:10,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1344420.0, ans=0.125 2024-08-11 23:40:14,153 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 23:40:18,264 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 23:40:26,125 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 23:40:31,243 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.90 vs. limit=15.0 2024-08-11 23:40:53,032 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.66 vs. limit=22.5 2024-08-11 23:41:00,531 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4050, loss[loss=0.1076, beats_loss=0.01216, ecapa_loss=0.0001884, whisper_loss=0.09357, over 20319.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01112, ecapa_loss=0.0001892, whisper_loss=0.09347, over 3896047.54 frames. ], batch size: 82, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:41:05,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1344820.0, ans=0.5 2024-08-11 23:41:20,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1344920.0, ans=0.125 2024-08-11 23:41:33,263 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 23:41:34,705 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 23:41:39,892 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.615e+01 3.014e+01 3.367e+01 5.886e+01, threshold=6.027e+01, percent-clipped=0.0 2024-08-11 23:41:41,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1345120.0, ans=0.125 2024-08-11 23:41:47,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1345120.0, ans=0.125 2024-08-11 23:41:47,562 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.58 vs. limit=10.0 2024-08-11 23:42:08,750 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4100, loss[loss=0.1221, beats_loss=0.009365, ecapa_loss=0.0001653, whisper_loss=0.111, over 22682.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01109, ecapa_loss=0.0001893, whisper_loss=0.09352, over 3907651.49 frames. ], batch size: 87, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:42:19,936 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-11 23:42:22,260 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.22 vs. limit=12.0 2024-08-11 23:42:32,324 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-11 23:42:32,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1345420.0, ans=0.125 2024-08-11 23:42:41,600 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-08-11 23:42:55,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1345620.0, ans=0.2 2024-08-11 23:43:11,215 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-11 23:43:13,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1345720.0, ans=0.125 2024-08-11 23:43:18,111 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4150, loss[loss=0.08157, beats_loss=0.01238, ecapa_loss=0.0001697, whisper_loss=0.0675, over 18176.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01116, ecapa_loss=0.0001886, whisper_loss=0.09351, over 3907195.39 frames. ], batch size: 74, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:43:37,563 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 22 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-11 23:43:41,932 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 23:43:58,530 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.669e+01 2.869e+01 3.344e+01 4.634e+01, threshold=5.739e+01, percent-clipped=0.0 2024-08-11 23:44:00,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1346120.0, ans=0.125 2024-08-11 23:44:05,649 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 23:44:12,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1346220.0, ans=0.0 2024-08-11 23:44:16,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1346220.0, ans=0.2 2024-08-11 23:44:27,808 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4200, loss[loss=0.1197, beats_loss=0.01065, ecapa_loss=0.0001773, whisper_loss=0.1072, over 14702.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01117, ecapa_loss=0.000189, whisper_loss=0.09322, over 3899979.71 frames. ], batch size: 57, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:44:41,874 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.517e+02 2024-08-11 23:44:43,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1346420.0, ans=0.125 2024-08-11 23:44:44,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1346420.0, ans=0.2 2024-08-11 23:44:49,067 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.83 vs. limit=6.0 2024-08-11 23:44:50,526 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=15.0 2024-08-11 23:44:56,700 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 28 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 23:45:06,273 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 18 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 23:45:12,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1346620.0, ans=0.0 2024-08-11 23:45:15,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1346620.0, ans=0.0 2024-08-11 23:45:15,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1346620.0, ans=0.1 2024-08-11 23:45:17,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1346620.0, ans=0.0 2024-08-11 23:45:17,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1346620.0, ans=0.125 2024-08-11 23:45:36,383 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4250, loss[loss=0.09113, beats_loss=0.01353, ecapa_loss=0.0001658, whisper_loss=0.07594, over 21361.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01116, ecapa_loss=0.0001883, whisper_loss=0.09265, over 3894693.61 frames. ], batch size: 87, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:45:36,518 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-11 23:45:38,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1346820.0, ans=0.125 2024-08-11 23:45:38,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1346820.0, ans=0.125 2024-08-11 23:45:41,957 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 23:45:55,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1346920.0, ans=0.125 2024-08-11 23:46:05,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1347020.0, ans=0.125 2024-08-11 23:46:10,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1347020.0, ans=0.125 2024-08-11 23:46:17,366 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.600e+01 2.838e+01 3.253e+01 4.399e+01, threshold=5.676e+01, percent-clipped=0.0 2024-08-11 23:46:20,314 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 23:46:21,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-08-11 23:46:23,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1347120.0, ans=0.1 2024-08-11 23:46:24,890 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-08-11 23:46:35,597 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 23:46:40,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1347220.0, ans=0.125 2024-08-11 23:46:42,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1347220.0, ans=0.0 2024-08-11 23:46:46,091 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4300, loss[loss=0.09791, beats_loss=0.01026, ecapa_loss=0.0001822, whisper_loss=0.08583, over 21506.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01124, ecapa_loss=0.0001871, whisper_loss=0.09206, over 3896031.14 frames. ], batch size: 85, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:46:57,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1347320.0, ans=0.125 2024-08-11 23:47:00,306 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 33 from Vox, 37 fro AS 2024-08-11 23:47:02,374 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2024-08-11 23:47:12,944 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 23:47:13,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1347520.0, ans=0.0 2024-08-11 23:47:25,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1347520.0, ans=0.2 2024-08-11 23:47:25,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1347520.0, ans=0.125 2024-08-11 23:47:56,019 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4350, loss[loss=0.09541, beats_loss=0.01206, ecapa_loss=0.0001941, whisper_loss=0.08141, over 19195.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01115, ecapa_loss=0.0001881, whisper_loss=0.09261, over 3915887.71 frames. ], batch size: 80, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:47:57,534 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 23:47:58,774 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 23:47:59,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1347820.0, ans=0.125 2024-08-11 23:48:00,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=1347820.0, ans=0.2 2024-08-11 23:48:31,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1348020.0, ans=0.1 2024-08-11 23:48:36,332 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.570e+01 2.850e+01 3.397e+01 5.504e+01, threshold=5.701e+01, percent-clipped=0.0 2024-08-11 23:48:36,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1348120.0, ans=0.0 2024-08-11 23:48:43,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2024-08-11 23:48:45,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1348120.0, ans=0.0 2024-08-11 23:48:47,599 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 23:49:01,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1348220.0, ans=0.125 2024-08-11 23:49:04,584 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-11 23:49:07,252 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4400, loss[loss=0.1254, beats_loss=0.01115, ecapa_loss=0.0002103, whisper_loss=0.1122, over 19252.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01111, ecapa_loss=0.0001872, whisper_loss=0.09248, over 3892459.91 frames. ], batch size: 78, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:49:13,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1348320.0, ans=0.1 2024-08-11 23:49:14,327 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 23:49:23,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1348420.0, ans=0.125 2024-08-11 23:49:23,695 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.23 vs. limit=22.5 2024-08-11 23:49:28,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1348420.0, ans=0.125 2024-08-11 23:49:41,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1348520.0, ans=0.125 2024-08-11 23:49:57,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1348620.0, ans=15.0 2024-08-11 23:50:01,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1348620.0, ans=0.1 2024-08-11 23:50:19,557 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4450, loss[loss=0.1083, beats_loss=0.01012, ecapa_loss=0.0001499, whisper_loss=0.09666, over 16319.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01107, ecapa_loss=0.0001867, whisper_loss=0.09286, over 3882515.60 frames. ], batch size: 62, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:50:20,002 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-11 23:50:50,063 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 23:50:58,513 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 23:51:01,168 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.643e+01 3.000e+01 3.439e+01 5.029e+01, threshold=6.000e+01, percent-clipped=0.0 2024-08-11 23:51:04,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1349120.0, ans=0.125 2024-08-11 23:51:09,697 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 23:51:29,831 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4500, loss[loss=0.1096, beats_loss=0.01016, ecapa_loss=0.0001619, whisper_loss=0.0978, over 14241.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01102, ecapa_loss=0.0001876, whisper_loss=0.09292, over 3883574.97 frames. ], batch size: 53, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:51:31,602 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 8 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 23:51:31,997 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.66 vs. limit=22.5 2024-08-11 23:51:33,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1349320.0, ans=0.125 2024-08-11 23:51:38,763 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 23:51:39,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1349320.0, ans=0.1 2024-08-11 23:51:39,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1349320.0, ans=0.125 2024-08-11 23:51:39,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1349320.0, ans=15.0 2024-08-11 23:51:40,508 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.279e+00 2024-08-11 23:51:44,130 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 23:51:53,780 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 23:51:55,195 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 23:51:57,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1349520.0, ans=0.0 2024-08-11 23:52:00,431 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.67 vs. limit=15.0 2024-08-11 23:52:04,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1349520.0, ans=0.125 2024-08-11 23:52:09,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1349520.0, ans=0.0 2024-08-11 23:52:25,848 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2024-08-11 23:52:38,975 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4550, loss[loss=0.1016, beats_loss=0.01153, ecapa_loss=0.0001643, whisper_loss=0.08847, over 16640.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0111, ecapa_loss=0.0001864, whisper_loss=0.09236, over 3909981.71 frames. ], batch size: 64, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:52:44,478 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-11 23:53:00,627 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-11 23:53:02,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1349920.0, ans=0.0 2024-08-11 23:53:04,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1349920.0, ans=0.2 2024-08-11 23:53:08,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1350020.0, ans=0.125 2024-08-11 23:53:16,694 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-11 23:53:19,056 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.529e+01 2.869e+01 3.379e+01 6.425e+01, threshold=5.739e+01, percent-clipped=1.0 2024-08-11 23:53:23,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1350120.0, ans=0.0 2024-08-11 23:53:24,760 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 23:53:29,779 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 23:53:33,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1350220.0, ans=0.125 2024-08-11 23:53:48,162 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4600, loss[loss=0.1094, beats_loss=0.007871, ecapa_loss=0.0002535, whisper_loss=0.09902, over 17654.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0111, ecapa_loss=0.0001874, whisper_loss=0.09266, over 3912248.99 frames. ], batch size: 71, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:53:48,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1350320.0, ans=0.95 2024-08-11 23:53:59,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1350320.0, ans=0.0 2024-08-11 23:54:08,584 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-11 23:54:18,434 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-11 23:54:29,960 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 23:54:44,347 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=22.5 2024-08-11 23:54:57,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1350720.0, ans=0.2 2024-08-11 23:55:00,744 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4650, loss[loss=0.106, beats_loss=0.01187, ecapa_loss=0.0001845, whisper_loss=0.09231, over 22671.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01112, ecapa_loss=0.0001873, whisper_loss=0.09206, over 3882181.59 frames. ], batch size: 93, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:55:18,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1350920.0, ans=0.5 2024-08-11 23:55:32,176 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-11 23:55:43,459 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.678e+01 3.059e+01 3.452e+01 5.229e+01, threshold=6.118e+01, percent-clipped=0.0 2024-08-11 23:55:50,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1351120.0, ans=0.0 2024-08-11 23:55:54,131 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-11 23:55:57,136 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-11 23:56:00,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1351220.0, ans=0.125 2024-08-11 23:56:00,651 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.26 vs. limit=12.0 2024-08-11 23:56:13,794 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4700, loss[loss=0.09152, beats_loss=0.01315, ecapa_loss=0.0001418, whisper_loss=0.07695, over 15189.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01107, ecapa_loss=0.0001876, whisper_loss=0.09235, over 3851960.27 frames. ], batch size: 59, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:56:18,249 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 23:56:29,565 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 23:56:58,592 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 23:57:09,646 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-11 23:57:10,540 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.59 vs. limit=22.5 2024-08-11 23:57:13,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1351720.0, ans=0.125 2024-08-11 23:57:27,193 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4750, loss[loss=0.1226, beats_loss=0.009292, ecapa_loss=0.0001666, whisper_loss=0.1116, over 19131.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01108, ecapa_loss=0.0001875, whisper_loss=0.0926, over 3875152.87 frames. ], batch size: 74, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:57:27,460 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 23:57:45,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1351920.0, ans=0.125 2024-08-11 23:57:51,727 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.31 vs. limit=15.0 2024-08-11 23:57:53,807 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 23:58:09,893 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.685e+01 3.044e+01 3.744e+01 5.202e+01, threshold=6.087e+01, percent-clipped=0.0 2024-08-11 23:58:22,104 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 23:58:37,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1352220.0, ans=0.125 2024-08-11 23:58:40,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1352320.0, ans=0.1 2024-08-11 23:58:41,674 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4800, loss[loss=0.1123, beats_loss=0.00988, ecapa_loss=0.0002359, whisper_loss=0.1001, over 22190.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01116, ecapa_loss=0.0001883, whisper_loss=0.09188, over 3887165.95 frames. ], batch size: 91, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:58:48,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1352320.0, ans=0.0 2024-08-11 23:59:32,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1352620.0, ans=0.125 2024-08-11 23:59:36,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1352620.0, ans=0.125 2024-08-11 23:59:38,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1352620.0, ans=0.125 2024-08-11 23:59:53,439 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-11 23:59:54,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1352720.0, ans=0.1 2024-08-11 23:59:57,086 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4850, loss[loss=0.09424, beats_loss=0.009674, ecapa_loss=0.0001948, whisper_loss=0.08262, over 14411.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01122, ecapa_loss=0.0001878, whisper_loss=0.09168, over 3915048.15 frames. ], batch size: 60, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:00:00,494 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-12 00:00:14,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1352920.0, ans=0.07 2024-08-12 00:00:17,293 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 00:00:17,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1352920.0, ans=0.125 2024-08-12 00:00:24,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1352920.0, ans=0.04949747468305833 2024-08-12 00:00:28,233 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 00:00:32,624 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 00:00:39,471 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 2.736e+01 3.106e+01 3.475e+01 1.081e+02, threshold=6.213e+01, percent-clipped=2.0 2024-08-12 00:00:39,714 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 00:00:40,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1353120.0, ans=0.5 2024-08-12 00:01:05,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1353220.0, ans=22.5 2024-08-12 00:01:10,557 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4900, loss[loss=0.097, beats_loss=0.009627, ecapa_loss=0.0002812, whisper_loss=0.08456, over 20520.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01126, ecapa_loss=0.0001863, whisper_loss=0.09139, over 3894415.52 frames. ], batch size: 91, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:01:15,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1353320.0, ans=0.1 2024-08-12 00:01:15,981 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.84 vs. limit=10.0 2024-08-12 00:01:54,892 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 20 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 00:01:57,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1353620.0, ans=0.025 2024-08-12 00:02:03,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1353620.0, ans=0.09899494936611666 2024-08-12 00:02:11,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1353720.0, ans=0.125 2024-08-12 00:02:20,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1353720.0, ans=0.0 2024-08-12 00:02:24,418 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 4950, loss[loss=0.1332, beats_loss=0.009981, ecapa_loss=0.0001994, whisper_loss=0.1212, over 22744.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01125, ecapa_loss=0.0001869, whisper_loss=0.09115, over 3851039.50 frames. ], batch size: 89, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:02:52,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1353920.0, ans=0.0 2024-08-12 00:02:52,458 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.31 vs. limit=10.0 2024-08-12 00:02:57,730 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-12 00:02:58,169 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.68 vs. limit=10.0 2024-08-12 00:03:05,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1354020.0, ans=0.1 2024-08-12 00:03:07,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.588e+01 2.867e+01 3.231e+01 4.752e+01, threshold=5.733e+01, percent-clipped=0.0 2024-08-12 00:03:33,335 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 00:03:35,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=1354220.0, ans=0.1 2024-08-12 00:03:37,294 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5000, loss[loss=0.09826, beats_loss=0.01129, ecapa_loss=0.0002067, whisper_loss=0.0849, over 22252.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01118, ecapa_loss=0.0001877, whisper_loss=0.09163, over 3855759.36 frames. ], batch size: 92, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:03:39,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1354320.0, ans=0.125 2024-08-12 00:04:00,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1354420.0, ans=0.125 2024-08-12 00:04:13,640 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 00:04:18,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1354520.0, ans=0.125 2024-08-12 00:04:22,092 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 00:04:25,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1354620.0, ans=0.5 2024-08-12 00:04:31,452 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-12 00:04:46,170 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 00:04:48,920 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5050, loss[loss=0.1236, beats_loss=0.01086, ecapa_loss=0.0001447, whisper_loss=0.1113, over 23752.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01116, ecapa_loss=0.0001875, whisper_loss=0.0927, over 3884223.30 frames. ], batch size: 90, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:04:50,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1354820.0, ans=0.0 2024-08-12 00:04:52,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1354820.0, ans=0.125 2024-08-12 00:04:54,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1354820.0, ans=0.125 2024-08-12 00:05:29,966 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.677e+01 3.041e+01 3.640e+01 6.697e+01, threshold=6.081e+01, percent-clipped=3.0 2024-08-12 00:05:58,515 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2024-08-12 00:06:00,459 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5100, loss[loss=0.1216, beats_loss=0.01033, ecapa_loss=0.0002057, whisper_loss=0.1092, over 21162.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01123, ecapa_loss=0.0001854, whisper_loss=0.09341, over 3912588.08 frames. ], batch size: 82, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:06:05,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1355320.0, ans=0.0 2024-08-12 00:06:06,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1355320.0, ans=0.0 2024-08-12 00:06:09,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1355320.0, ans=0.125 2024-08-12 00:06:16,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1355420.0, ans=0.0 2024-08-12 00:06:25,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1355420.0, ans=0.1 2024-08-12 00:06:39,723 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.549e-01 2024-08-12 00:06:40,092 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2024-08-12 00:06:57,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1355720.0, ans=0.125 2024-08-12 00:06:58,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1355720.0, ans=0.0 2024-08-12 00:07:00,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1355720.0, ans=0.125 2024-08-12 00:07:05,001 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-12 00:07:07,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1355720.0, ans=0.125 2024-08-12 00:07:09,936 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5150, loss[loss=0.1153, beats_loss=0.01068, ecapa_loss=0.0001713, whisper_loss=0.1029, over 23040.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01112, ecapa_loss=0.000185, whisper_loss=0.09416, over 3905583.19 frames. ], batch size: 91, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:07:19,950 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-12 00:07:23,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1355920.0, ans=0.2 2024-08-12 00:07:23,259 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=12.0 2024-08-12 00:07:40,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1356020.0, ans=0.07 2024-08-12 00:07:45,696 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 00:07:50,904 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.562e+01 2.961e+01 3.572e+01 5.621e+01, threshold=5.922e+01, percent-clipped=0.0 2024-08-12 00:07:51,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1356120.0, ans=0.125 2024-08-12 00:07:52,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1356120.0, ans=0.5 2024-08-12 00:07:54,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1356120.0, ans=0.2 2024-08-12 00:08:00,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1356120.0, ans=0.2 2024-08-12 00:08:01,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1356120.0, ans=0.0 2024-08-12 00:08:08,240 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 00:08:15,086 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 00:08:15,696 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-12 00:08:16,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1356220.0, ans=0.0 2024-08-12 00:08:20,609 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5200, loss[loss=0.12, beats_loss=0.01075, ecapa_loss=0.0001595, whisper_loss=0.1077, over 22657.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01102, ecapa_loss=0.0001849, whisper_loss=0.09482, over 3923341.87 frames. ], batch size: 85, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:08:26,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1356320.0, ans=0.09899494936611666 2024-08-12 00:08:40,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1356420.0, ans=0.125 2024-08-12 00:08:41,839 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-12 00:08:43,094 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-12 00:09:28,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1356720.0, ans=0.0 2024-08-12 00:09:30,570 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.42 vs. limit=15.0 2024-08-12 00:09:30,875 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5250, loss[loss=0.1021, beats_loss=0.009009, ecapa_loss=0.0001901, whisper_loss=0.09115, over 13816.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01105, ecapa_loss=0.0001844, whisper_loss=0.09416, over 3899493.26 frames. ], batch size: 54, lr: 6.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:09:32,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1356820.0, ans=0.0 2024-08-12 00:09:42,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1356820.0, ans=0.125 2024-08-12 00:09:43,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1356920.0, ans=0.2 2024-08-12 00:10:11,252 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.537e+01 2.858e+01 3.258e+01 4.916e+01, threshold=5.717e+01, percent-clipped=0.0 2024-08-12 00:10:15,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1357120.0, ans=0.2 2024-08-12 00:10:19,275 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.72 vs. limit=22.5 2024-08-12 00:10:40,568 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5300, loss[loss=0.09421, beats_loss=0.01255, ecapa_loss=0.0001885, whisper_loss=0.07978, over 18997.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01105, ecapa_loss=0.0001859, whisper_loss=0.09376, over 3874057.10 frames. ], batch size: 78, lr: 6.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:10:45,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1357320.0, ans=0.2 2024-08-12 00:10:57,509 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-12 00:10:58,688 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 00:11:04,216 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-12 00:11:25,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1357620.0, ans=0.0 2024-08-12 00:11:29,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1357620.0, ans=0.125 2024-08-12 00:11:31,655 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.47 vs. limit=12.0 2024-08-12 00:11:35,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1357720.0, ans=0.1 2024-08-12 00:11:42,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=1357720.0, ans=0.02 2024-08-12 00:11:49,234 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5350, loss[loss=0.1189, beats_loss=0.009725, ecapa_loss=0.0001965, whisper_loss=0.1073, over 18211.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01105, ecapa_loss=0.000184, whisper_loss=0.09377, over 3888770.88 frames. ], batch size: 72, lr: 6.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:12:30,561 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.572e+01 2.816e+01 3.245e+01 5.813e+01, threshold=5.633e+01, percent-clipped=1.0 2024-08-12 00:12:44,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1358120.0, ans=0.0 2024-08-12 00:12:47,235 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 00:12:50,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1358220.0, ans=0.0 2024-08-12 00:13:01,876 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5400, loss[loss=0.1127, beats_loss=0.01028, ecapa_loss=0.000177, whisper_loss=0.1006, over 23289.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01098, ecapa_loss=0.0001853, whisper_loss=0.09412, over 3895298.60 frames. ], batch size: 91, lr: 6.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:13:11,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1358320.0, ans=0.125 2024-08-12 00:13:13,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1358320.0, ans=0.125 2024-08-12 00:13:22,240 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 00:13:36,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1358520.0, ans=0.2 2024-08-12 00:13:44,646 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2024-08-12 00:13:55,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1358620.0, ans=0.125 2024-08-12 00:14:03,035 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 00:14:18,645 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5450, loss[loss=0.1024, beats_loss=0.0112, ecapa_loss=0.0001742, whisper_loss=0.08947, over 19989.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01103, ecapa_loss=0.0001865, whisper_loss=0.09347, over 3894084.54 frames. ], batch size: 80, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:14:27,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1358820.0, ans=0.2 2024-08-12 00:14:27,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1358820.0, ans=0.125 2024-08-12 00:14:52,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1359020.0, ans=0.125 2024-08-12 00:15:05,331 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.617e+01 2.957e+01 3.359e+01 7.305e+01, threshold=5.914e+01, percent-clipped=2.0 2024-08-12 00:15:12,606 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:15:36,288 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.46 vs. limit=15.0 2024-08-12 00:15:46,603 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5500, loss[loss=0.09828, beats_loss=0.01015, ecapa_loss=0.0002307, whisper_loss=0.08583, over 21214.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01098, ecapa_loss=0.0001873, whisper_loss=0.09306, over 3902305.92 frames. ], batch size: 92, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:15:50,765 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 25 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-12 00:16:04,009 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-12 00:16:14,270 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.88 vs. limit=22.5 2024-08-12 00:16:20,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1359420.0, ans=0.2 2024-08-12 00:16:28,101 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 00:16:29,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1359520.0, ans=0.0 2024-08-12 00:16:39,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1359520.0, ans=0.0 2024-08-12 00:16:41,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1359520.0, ans=0.125 2024-08-12 00:16:41,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1359520.0, ans=0.125 2024-08-12 00:16:51,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1359620.0, ans=0.125 2024-08-12 00:17:19,707 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5550, loss[loss=0.1019, beats_loss=0.008807, ecapa_loss=0.0001906, whisper_loss=0.09118, over 14579.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01102, ecapa_loss=0.0001881, whisper_loss=0.09304, over 3906555.62 frames. ], batch size: 57, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:17:40,791 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 00:18:04,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1360020.0, ans=0.05 2024-08-12 00:18:06,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1360020.0, ans=0.125 2024-08-12 00:18:08,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1360020.0, ans=0.125 2024-08-12 00:18:14,946 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.662e+01 3.000e+01 3.511e+01 5.450e+01, threshold=6.001e+01, percent-clipped=0.0 2024-08-12 00:18:18,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1360120.0, ans=0.0 2024-08-12 00:18:24,732 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 00:18:27,769 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2024-08-12 00:18:32,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1360120.0, ans=0.0 2024-08-12 00:18:35,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1360220.0, ans=0.0 2024-08-12 00:18:45,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1360220.0, ans=0.125 2024-08-12 00:18:52,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1360320.0, ans=0.125 2024-08-12 00:18:53,358 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5600, loss[loss=0.1043, beats_loss=0.01187, ecapa_loss=0.0002032, whisper_loss=0.0904, over 19306.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01109, ecapa_loss=0.0001866, whisper_loss=0.09336, over 3915570.00 frames. ], batch size: 81, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:18:57,732 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 00:19:11,701 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 00:19:41,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1360520.0, ans=0.125 2024-08-12 00:19:54,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1360620.0, ans=0.2 2024-08-12 00:20:15,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1360720.0, ans=0.2 2024-08-12 00:20:16,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1360720.0, ans=0.1 2024-08-12 00:20:16,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1360720.0, ans=0.2 2024-08-12 00:20:24,795 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5650, loss[loss=0.0965, beats_loss=0.01244, ecapa_loss=0.0001952, whisper_loss=0.08211, over 21926.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01116, ecapa_loss=0.0001863, whisper_loss=0.09312, over 3936855.58 frames. ], batch size: 92, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:20:30,304 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 00:20:34,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1360820.0, ans=0.125 2024-08-12 00:20:36,210 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2024-08-12 00:20:40,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1360920.0, ans=0.125 2024-08-12 00:20:45,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1360920.0, ans=0.07 2024-08-12 00:20:49,268 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-12 00:21:01,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1361020.0, ans=0.0 2024-08-12 00:21:04,065 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.079e+01 2.708e+01 3.179e+01 3.775e+01 1.197e+02, threshold=6.358e+01, percent-clipped=2.0 2024-08-12 00:21:04,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1361120.0, ans=0.0 2024-08-12 00:21:06,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1361120.0, ans=0.1 2024-08-12 00:21:10,853 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 00:21:11,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=12.0 2024-08-12 00:21:28,300 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.54 vs. limit=6.0 2024-08-12 00:21:32,902 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5700, loss[loss=0.0932, beats_loss=0.01365, ecapa_loss=0.0001769, whisper_loss=0.07778, over 18703.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01119, ecapa_loss=0.0001876, whisper_loss=0.09318, over 3962291.97 frames. ], batch size: 75, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:21:41,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1361320.0, ans=0.0 2024-08-12 00:21:47,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1361420.0, ans=0.0 2024-08-12 00:21:58,548 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.65 vs. limit=15.0 2024-08-12 00:22:10,614 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-12 00:22:13,502 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 00:22:16,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1361620.0, ans=0.035 2024-08-12 00:22:20,252 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-12 00:22:24,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1361620.0, ans=0.0 2024-08-12 00:22:24,700 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.73 vs. limit=10.0 2024-08-12 00:22:40,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5750, loss[loss=0.1053, beats_loss=0.01185, ecapa_loss=0.0002077, whisper_loss=0.09142, over 17963.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01119, ecapa_loss=0.0001879, whisper_loss=0.09211, over 3956365.12 frames. ], batch size: 75, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:22:48,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1361820.0, ans=0.2 2024-08-12 00:23:17,579 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-12 00:23:20,140 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.574e+01 2.789e+01 3.089e+01 4.490e+01, threshold=5.577e+01, percent-clipped=0.0 2024-08-12 00:23:38,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1362220.0, ans=0.2 2024-08-12 00:23:42,644 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 12 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 00:23:49,620 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5800, loss[loss=0.1111, beats_loss=0.01225, ecapa_loss=0.0001682, whisper_loss=0.09716, over 18415.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01119, ecapa_loss=0.0001869, whisper_loss=0.09225, over 3932897.26 frames. ], batch size: 71, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:23:55,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1362320.0, ans=0.125 2024-08-12 00:24:13,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1362420.0, ans=0.125 2024-08-12 00:24:17,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1362520.0, ans=0.1 2024-08-12 00:24:27,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1362520.0, ans=0.0 2024-08-12 00:24:34,002 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-12 00:24:36,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1362620.0, ans=0.125 2024-08-12 00:24:41,887 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 00:24:43,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1362720.0, ans=0.2 2024-08-12 00:24:58,093 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5850, loss[loss=0.1012, beats_loss=0.01277, ecapa_loss=0.0001814, whisper_loss=0.08659, over 19438.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01123, ecapa_loss=0.0001861, whisper_loss=0.09191, over 3946337.18 frames. ], batch size: 78, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:25:09,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1362820.0, ans=0.1 2024-08-12 00:25:31,467 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.70 vs. limit=10.0 2024-08-12 00:25:37,560 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.515e+01 2.804e+01 3.095e+01 4.578e+01, threshold=5.608e+01, percent-clipped=0.0 2024-08-12 00:25:49,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1363120.0, ans=0.125 2024-08-12 00:25:54,460 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 00:26:06,435 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5900, loss[loss=0.08385, beats_loss=0.01299, ecapa_loss=0.0001941, whisper_loss=0.06892, over 21923.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0112, ecapa_loss=0.0001854, whisper_loss=0.09228, over 3932694.18 frames. ], batch size: 92, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:26:12,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1363320.0, ans=0.0 2024-08-12 00:26:18,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2024-08-12 00:26:28,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1363420.0, ans=0.0 2024-08-12 00:26:29,969 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 31 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 00:26:34,758 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.93 vs. limit=15.0 2024-08-12 00:26:42,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1363520.0, ans=6.0 2024-08-12 00:26:45,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1363520.0, ans=0.125 2024-08-12 00:26:50,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1363620.0, ans=0.07 2024-08-12 00:26:53,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1363620.0, ans=0.125 2024-08-12 00:26:53,944 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.43 vs. limit=10.0 2024-08-12 00:27:05,235 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 00:27:08,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1363720.0, ans=15.0 2024-08-12 00:27:14,339 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 5950, loss[loss=0.09254, beats_loss=0.01174, ecapa_loss=0.0002172, whisper_loss=0.07863, over 18092.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01119, ecapa_loss=0.0001869, whisper_loss=0.09192, over 3903830.64 frames. ], batch size: 75, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:27:17,882 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.28 vs. limit=22.5 2024-08-12 00:27:18,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1363820.0, ans=0.05 2024-08-12 00:27:31,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1363920.0, ans=0.125 2024-08-12 00:27:47,006 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 32 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 00:27:52,232 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-12 00:27:53,315 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+01 2.619e+01 2.853e+01 3.292e+01 6.548e+01, threshold=5.706e+01, percent-clipped=1.0 2024-08-12 00:28:03,556 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 00:28:03,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1364120.0, ans=0.0 2024-08-12 00:28:05,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1364120.0, ans=0.125 2024-08-12 00:28:06,308 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 00:28:11,580 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 00:28:12,188 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2024-08-12 00:28:16,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1364220.0, ans=0.0 2024-08-12 00:28:22,347 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6000, loss[loss=0.1238, beats_loss=0.009049, ecapa_loss=0.0002075, whisper_loss=0.1127, over 22559.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01121, ecapa_loss=0.0001873, whisper_loss=0.09203, over 3878016.93 frames. ], batch size: 91, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:28:22,347 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 00:29:04,088 INFO [train_multi_KD3.py:1149] (2/4) Epoch 10, validation on ASR_libri: loss=0.2569, beats_loss=0, ecapa_loss=0.0006172, whisper_loss=0.2508, over 922467.00 frames. 2024-08-12 00:29:22,656 INFO [train_multi_KD3.py:1149] (2/4) Epoch 10, validation on SV_voxceleb1: loss=0.005036, beats_loss=0, ecapa_loss=0.0005036, whisper_loss=0, over 939242.00 frames. 2024-08-12 00:31:25,935 INFO [train_multi_KD3.py:1149] (2/4) Epoch 10, validation on AT_audioset: loss=0.02463, beats_loss=0.02463, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 00:31:25,940 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 00:31:31,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1364320.0, ans=0.125 2024-08-12 00:31:38,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1364420.0, ans=0.0 2024-08-12 00:31:44,153 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2024-08-12 00:31:47,677 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-12 00:31:54,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1364520.0, ans=0.0 2024-08-12 00:32:00,087 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 00:32:26,434 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 00:32:33,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1364820.0, ans=0.125 2024-08-12 00:32:34,426 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6050, loss[loss=0.1141, beats_loss=0.01088, ecapa_loss=0.000145, whisper_loss=0.1017, over 18327.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01123, ecapa_loss=0.0001864, whisper_loss=0.09192, over 3859792.65 frames. ], batch size: 68, lr: 6.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:33:16,276 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.639e+01 2.972e+01 3.364e+01 6.267e+01, threshold=5.943e+01, percent-clipped=1.0 2024-08-12 00:33:19,303 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 00:33:23,669 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 00:33:33,056 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-12 00:33:37,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1365220.0, ans=0.125 2024-08-12 00:33:44,107 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6100, loss[loss=0.1209, beats_loss=0.00918, ecapa_loss=0.0001594, whisper_loss=0.1102, over 21565.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.0112, ecapa_loss=0.0001862, whisper_loss=0.09236, over 3854924.37 frames. ], batch size: 80, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:34:02,467 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2024-08-12 00:34:03,083 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-12 00:34:03,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1365420.0, ans=0.1 2024-08-12 00:34:18,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1365520.0, ans=0.2 2024-08-12 00:34:33,795 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 00:34:49,027 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.64 vs. limit=22.5 2024-08-12 00:34:52,377 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 00:34:54,836 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6150, loss[loss=0.1075, beats_loss=0.01329, ecapa_loss=0.0001202, whisper_loss=0.09304, over 21027.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01123, ecapa_loss=0.0001874, whisper_loss=0.09182, over 3872203.62 frames. ], batch size: 80, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:34:59,066 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 00:35:15,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1365920.0, ans=0.125 2024-08-12 00:35:22,337 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-12 00:35:26,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1366020.0, ans=0.2 2024-08-12 00:35:36,003 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.497e+01 2.771e+01 3.038e+01 4.710e+01, threshold=5.541e+01, percent-clipped=0.0 2024-08-12 00:35:48,544 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 00:35:55,238 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 00:35:56,710 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-12 00:36:03,273 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6200, loss[loss=0.1073, beats_loss=0.01061, ecapa_loss=0.0001311, whisper_loss=0.0954, over 14982.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0112, ecapa_loss=0.0001867, whisper_loss=0.09169, over 3848432.98 frames. ], batch size: 55, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:36:04,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1366320.0, ans=0.125 2024-08-12 00:36:04,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1366320.0, ans=0.0 2024-08-12 00:36:05,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1366320.0, ans=0.125 2024-08-12 00:36:06,318 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-12 00:36:37,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1366520.0, ans=0.0 2024-08-12 00:36:41,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1366520.0, ans=0.0 2024-08-12 00:36:52,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1366620.0, ans=0.125 2024-08-12 00:37:03,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1366720.0, ans=0.2 2024-08-12 00:37:03,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1366720.0, ans=0.0 2024-08-12 00:37:06,965 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 00:37:12,609 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6250, loss[loss=0.1175, beats_loss=0.01188, ecapa_loss=0.0001517, whisper_loss=0.1042, over 18255.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01111, ecapa_loss=0.0001872, whisper_loss=0.09211, over 3863266.10 frames. ], batch size: 71, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:37:13,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1366820.0, ans=0.125 2024-08-12 00:37:19,583 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 00:37:22,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1366820.0, ans=0.1 2024-08-12 00:37:53,859 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.633e+01 2.869e+01 3.281e+01 7.272e+01, threshold=5.739e+01, percent-clipped=3.0 2024-08-12 00:38:11,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1367220.0, ans=0.1 2024-08-12 00:38:16,611 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 00:38:21,927 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6300, loss[loss=0.1001, beats_loss=0.01094, ecapa_loss=0.0001782, whisper_loss=0.08738, over 22930.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01118, ecapa_loss=0.0001866, whisper_loss=0.0919, over 3903510.70 frames. ], batch size: 91, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:38:25,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1367320.0, ans=0.125 2024-08-12 00:38:28,639 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 26 from Vox, 19 fro AS 2024-08-12 00:38:33,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1367320.0, ans=0.125 2024-08-12 00:38:40,347 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:38:40,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1367420.0, ans=0.2 2024-08-12 00:38:48,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1367520.0, ans=0.125 2024-08-12 00:38:51,938 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 00:38:58,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1367520.0, ans=0.0 2024-08-12 00:39:04,785 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 00:39:06,596 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2024-08-12 00:39:12,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.85 vs. limit=15.0 2024-08-12 00:39:14,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1367620.0, ans=0.0 2024-08-12 00:39:23,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1367720.0, ans=0.0 2024-08-12 00:39:30,850 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6350, loss[loss=0.08524, beats_loss=0.009418, ecapa_loss=0.0001697, whisper_loss=0.07412, over 14943.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.0111, ecapa_loss=0.0001853, whisper_loss=0.09247, over 3899174.25 frames. ], batch size: 58, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:39:37,988 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 00:39:45,181 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 33 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 00:39:47,925 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-12 00:40:12,490 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.594e+01 2.991e+01 3.551e+01 3.558e+02, threshold=5.982e+01, percent-clipped=1.0 2024-08-12 00:40:12,674 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 35 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-12 00:40:23,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1368120.0, ans=0.05 2024-08-12 00:40:40,079 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6400, loss[loss=0.1116, beats_loss=0.01115, ecapa_loss=0.0001442, whisper_loss=0.09903, over 23645.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01113, ecapa_loss=0.0001849, whisper_loss=0.09279, over 3915337.61 frames. ], batch size: 91, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:40:41,925 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 00:40:43,129 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-12 00:41:01,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1368420.0, ans=0.2 2024-08-12 00:41:07,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1368520.0, ans=0.1 2024-08-12 00:41:22,039 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 14 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 00:41:23,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1368620.0, ans=10.0 2024-08-12 00:41:26,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1368620.0, ans=0.0 2024-08-12 00:41:29,018 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 00:41:31,521 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 00:41:37,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1368720.0, ans=0.0 2024-08-12 00:41:41,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1368720.0, ans=0.125 2024-08-12 00:41:49,058 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6450, loss[loss=0.08304, beats_loss=0.01321, ecapa_loss=0.0001827, whisper_loss=0.068, over 14474.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01115, ecapa_loss=0.000186, whisper_loss=0.09299, over 3927196.47 frames. ], batch size: 57, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:41:53,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1368820.0, ans=0.09899494936611666 2024-08-12 00:42:16,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1369020.0, ans=0.125 2024-08-12 00:42:30,095 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.638e+01 2.996e+01 3.413e+01 4.809e+01, threshold=5.992e+01, percent-clipped=1.0 2024-08-12 00:42:34,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1369120.0, ans=0.1 2024-08-12 00:42:38,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1369120.0, ans=0.0 2024-08-12 00:42:45,942 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 00:42:47,147 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 00:42:58,165 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6500, loss[loss=0.08967, beats_loss=0.01106, ecapa_loss=0.0002034, whisper_loss=0.07657, over 13726.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01109, ecapa_loss=0.0001869, whisper_loss=0.09332, over 3930554.78 frames. ], batch size: 54, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:43:18,061 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 18 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 00:43:26,047 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 00:43:33,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1369520.0, ans=0.125 2024-08-12 00:43:49,508 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 00:44:05,842 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 00:44:06,999 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6550, loss[loss=0.1219, beats_loss=0.009916, ecapa_loss=0.0001893, whisper_loss=0.1101, over 22751.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01113, ecapa_loss=0.000186, whisper_loss=0.09308, over 3913484.22 frames. ], batch size: 88, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:44:13,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1369820.0, ans=0.0 2024-08-12 00:44:31,980 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 13 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 00:44:48,559 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.662e+01 3.000e+01 3.439e+01 5.833e+01, threshold=5.999e+01, percent-clipped=0.0 2024-08-12 00:44:52,997 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-12 00:44:53,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1370120.0, ans=0.125 2024-08-12 00:45:02,429 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 00:45:05,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1370220.0, ans=0.035 2024-08-12 00:45:15,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1370320.0, ans=0.125 2024-08-12 00:45:16,050 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6600, loss[loss=0.1301, beats_loss=0.007382, ecapa_loss=0.0002267, whisper_loss=0.1204, over 22232.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0111, ecapa_loss=0.0001877, whisper_loss=0.09325, over 3932481.28 frames. ], batch size: 88, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:45:38,673 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-08-12 00:45:39,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1370420.0, ans=0.1 2024-08-12 00:45:41,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1370420.0, ans=0.0 2024-08-12 00:46:17,661 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.54 vs. limit=22.5 2024-08-12 00:46:25,061 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6650, loss[loss=0.1319, beats_loss=0.009563, ecapa_loss=0.0002266, whisper_loss=0.1201, over 22563.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01111, ecapa_loss=0.0001891, whisper_loss=0.09363, over 3951981.35 frames. ], batch size: 90, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:46:33,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1370820.0, ans=0.125 2024-08-12 00:46:33,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1370820.0, ans=0.125 2024-08-12 00:46:39,817 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 00:47:04,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1371020.0, ans=0.0 2024-08-12 00:47:06,632 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.593e+01 2.812e+01 3.124e+01 4.169e+01, threshold=5.623e+01, percent-clipped=0.0 2024-08-12 00:47:06,846 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 00:47:09,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1371120.0, ans=0.125 2024-08-12 00:47:15,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1371120.0, ans=0.1 2024-08-12 00:47:15,911 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.61 vs. limit=22.5 2024-08-12 00:47:33,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1371320.0, ans=0.125 2024-08-12 00:47:34,536 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6700, loss[loss=0.1115, beats_loss=0.01013, ecapa_loss=0.0002024, whisper_loss=0.09932, over 20315.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01107, ecapa_loss=0.0001891, whisper_loss=0.09326, over 3911753.39 frames. ], batch size: 84, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:47:41,448 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.44 vs. limit=15.0 2024-08-12 00:47:50,605 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.04 vs. limit=22.5 2024-08-12 00:47:51,247 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-12 00:48:04,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1371520.0, ans=0.125 2024-08-12 00:48:14,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1371520.0, ans=0.1 2024-08-12 00:48:18,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1371620.0, ans=0.0 2024-08-12 00:48:21,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1371620.0, ans=0.05 2024-08-12 00:48:23,371 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.15 vs. limit=10.0 2024-08-12 00:48:24,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1371620.0, ans=0.125 2024-08-12 00:48:30,984 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 00:48:41,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1371720.0, ans=0.0 2024-08-12 00:48:44,864 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6750, loss[loss=0.1104, beats_loss=0.01176, ecapa_loss=0.0001811, whisper_loss=0.09678, over 22406.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01105, ecapa_loss=0.000188, whisper_loss=0.09382, over 3913125.15 frames. ], batch size: 89, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:48:48,055 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 15 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 00:49:01,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1371920.0, ans=0.0 2024-08-12 00:49:14,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1372020.0, ans=0.2 2024-08-12 00:49:26,544 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.541e+01 2.925e+01 3.464e+01 4.634e+01, threshold=5.851e+01, percent-clipped=0.0 2024-08-12 00:49:27,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1372120.0, ans=0.1 2024-08-12 00:49:31,367 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 00:49:35,166 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 00:49:49,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1372220.0, ans=0.0 2024-08-12 00:49:54,331 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6800, loss[loss=0.1184, beats_loss=0.007641, ecapa_loss=0.0002605, whisper_loss=0.1082, over 21748.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01102, ecapa_loss=0.0001878, whisper_loss=0.09385, over 3922178.43 frames. ], batch size: 93, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:49:56,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1372320.0, ans=0.0 2024-08-12 00:50:05,472 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-12 00:50:08,255 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 00:50:13,375 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-08-12 00:50:17,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1372420.0, ans=0.125 2024-08-12 00:50:18,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1372420.0, ans=0.0 2024-08-12 00:50:26,399 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-12 00:50:33,397 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 00:50:45,641 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-12 00:50:48,614 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 13 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 00:50:58,176 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 00:50:58,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1372720.0, ans=0.0 2024-08-12 00:51:03,546 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6850, loss[loss=0.1089, beats_loss=0.0107, ecapa_loss=0.0001646, whisper_loss=0.0966, over 23145.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01104, ecapa_loss=0.0001886, whisper_loss=0.09344, over 3922562.95 frames. ], batch size: 92, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:51:19,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1372920.0, ans=0.125 2024-08-12 00:51:44,564 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.602e+01 2.969e+01 3.307e+01 6.186e+01, threshold=5.938e+01, percent-clipped=1.0 2024-08-12 00:51:45,582 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=12.0 2024-08-12 00:51:47,524 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 37 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 00:52:00,648 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2024-08-12 00:52:04,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1373220.0, ans=0.0 2024-08-12 00:52:05,934 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 00:52:12,483 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6900, loss[loss=0.08841, beats_loss=0.01194, ecapa_loss=0.0001962, whisper_loss=0.0745, over 16750.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01108, ecapa_loss=0.0001883, whisper_loss=0.09399, over 3942507.27 frames. ], batch size: 67, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:52:27,100 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-12 00:52:41,451 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 00:52:55,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1373620.0, ans=0.0 2024-08-12 00:52:57,041 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-12 00:53:03,482 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2024-08-12 00:53:05,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1373620.0, ans=0.125 2024-08-12 00:53:08,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1373720.0, ans=0.0 2024-08-12 00:53:22,468 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-12 00:53:23,560 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 6950, loss[loss=0.1246, beats_loss=0.008065, ecapa_loss=0.0002167, whisper_loss=0.1144, over 15396.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01107, ecapa_loss=0.0001868, whisper_loss=0.09444, over 3948601.15 frames. ], batch size: 57, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:53:29,926 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.40 vs. limit=22.5 2024-08-12 00:53:35,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1373820.0, ans=0.125 2024-08-12 00:53:36,181 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 29 from Vox, 25 fro AS 2024-08-12 00:53:38,056 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-12 00:53:52,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1374020.0, ans=0.125 2024-08-12 00:54:05,812 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.522e+01 2.749e+01 3.045e+01 4.953e+01, threshold=5.497e+01, percent-clipped=0.0 2024-08-12 00:54:33,988 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7000, loss[loss=0.09258, beats_loss=0.01038, ecapa_loss=0.0002458, whisper_loss=0.07975, over 19846.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01111, ecapa_loss=0.0001864, whisper_loss=0.09386, over 3920432.07 frames. ], batch size: 84, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:54:44,804 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 00:54:49,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1374420.0, ans=0.125 2024-08-12 00:54:52,467 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.389e+02 2024-08-12 00:54:59,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1374420.0, ans=0.125 2024-08-12 00:55:15,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1374620.0, ans=0.2 2024-08-12 00:55:16,891 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 00:55:17,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1374620.0, ans=0.125 2024-08-12 00:55:24,008 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.02 vs. limit=6.0 2024-08-12 00:55:26,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1374620.0, ans=0.125 2024-08-12 00:55:41,952 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7050, loss[loss=0.09258, beats_loss=0.01133, ecapa_loss=0.0001768, whisper_loss=0.07948, over 19083.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0112, ecapa_loss=0.0001866, whisper_loss=0.09217, over 3915164.31 frames. ], batch size: 77, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:55:51,684 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 00:55:52,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1374820.0, ans=0.125 2024-08-12 00:55:58,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1374920.0, ans=0.125 2024-08-12 00:56:03,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1374920.0, ans=0.125 2024-08-12 00:56:08,568 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2024-08-12 00:56:15,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1375020.0, ans=0.025 2024-08-12 00:56:15,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1375020.0, ans=0.0 2024-08-12 00:56:23,077 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.564e+01 2.939e+01 3.594e+01 1.844e+02, threshold=5.878e+01, percent-clipped=7.0 2024-08-12 00:56:34,292 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 37 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 00:56:45,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1375220.0, ans=0.125 2024-08-12 00:56:49,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1375220.0, ans=15.0 2024-08-12 00:56:50,738 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7100, loss[loss=0.1062, beats_loss=0.01045, ecapa_loss=0.0002117, whisper_loss=0.09367, over 19593.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01124, ecapa_loss=0.0001858, whisper_loss=0.0917, over 3919604.65 frames. ], batch size: 79, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:57:22,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1375520.0, ans=0.0 2024-08-12 00:57:34,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1375620.0, ans=0.125 2024-08-12 00:57:59,839 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7150, loss[loss=0.1197, beats_loss=0.01178, ecapa_loss=0.0001568, whisper_loss=0.1064, over 22381.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01122, ecapa_loss=0.0001864, whisper_loss=0.09215, over 3940523.64 frames. ], batch size: 86, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:58:11,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1375820.0, ans=0.025 2024-08-12 00:58:16,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1375920.0, ans=0.125 2024-08-12 00:58:35,048 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 00:58:36,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1376020.0, ans=0.0 2024-08-12 00:58:42,255 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.098e+01 2.592e+01 2.864e+01 3.293e+01 5.608e+01, threshold=5.729e+01, percent-clipped=0.0 2024-08-12 00:58:50,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1376120.0, ans=0.125 2024-08-12 00:58:56,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1376220.0, ans=0.125 2024-08-12 00:59:01,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1376220.0, ans=0.04949747468305833 2024-08-12 00:59:09,140 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7200, loss[loss=0.1156, beats_loss=0.01074, ecapa_loss=0.0002046, whisper_loss=0.1028, over 21423.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01111, ecapa_loss=0.0001863, whisper_loss=0.09304, over 3929810.65 frames. ], batch size: 89, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:59:16,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1376320.0, ans=0.125 2024-08-12 00:59:17,378 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.48 vs. limit=15.0 2024-08-12 00:59:17,738 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 31 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-12 00:59:33,171 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 00:59:39,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1376520.0, ans=0.125 2024-08-12 00:59:48,484 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-12 00:59:51,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1376620.0, ans=0.95 2024-08-12 01:00:17,921 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7250, loss[loss=0.1085, beats_loss=0.01229, ecapa_loss=0.0001567, whisper_loss=0.09464, over 17000.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01116, ecapa_loss=0.0001863, whisper_loss=0.09285, over 3911851.25 frames. ], batch size: 68, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:00:22,819 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-08-12 01:00:25,046 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 01:00:39,526 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.91 vs. limit=15.0 2024-08-12 01:00:47,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1377020.0, ans=0.1 2024-08-12 01:00:49,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1377020.0, ans=0.015 2024-08-12 01:00:59,476 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.064e+01 2.509e+01 2.818e+01 3.163e+01 4.594e+01, threshold=5.637e+01, percent-clipped=0.0 2024-08-12 01:01:01,724 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=15.0 2024-08-12 01:01:14,679 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 01:01:15,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1377220.0, ans=0.0 2024-08-12 01:01:15,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1377220.0, ans=0.125 2024-08-12 01:01:17,280 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-12 01:01:18,781 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-12 01:01:27,378 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7300, loss[loss=0.08204, beats_loss=0.01305, ecapa_loss=0.0001591, whisper_loss=0.06739, over 20230.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01114, ecapa_loss=0.0001866, whisper_loss=0.09268, over 3926646.07 frames. ], batch size: 82, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:01:40,148 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=22.5 2024-08-12 01:01:49,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1377420.0, ans=0.125 2024-08-12 01:02:15,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1377620.0, ans=0.0 2024-08-12 01:02:19,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1377620.0, ans=0.125 2024-08-12 01:02:29,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1377720.0, ans=0.1 2024-08-12 01:02:29,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=1377720.0, ans=0.02 2024-08-12 01:02:34,673 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-12 01:02:37,280 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7350, loss[loss=0.1074, beats_loss=0.012, ecapa_loss=0.0001612, whisper_loss=0.09381, over 20522.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01122, ecapa_loss=0.0001853, whisper_loss=0.09207, over 3901676.78 frames. ], batch size: 79, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:02:40,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1377820.0, ans=0.125 2024-08-12 01:02:46,655 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-12 01:02:47,600 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2024-08-12 01:02:54,434 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2024-08-12 01:03:18,942 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.545e+01 2.938e+01 3.274e+01 5.414e+01, threshold=5.876e+01, percent-clipped=0.0 2024-08-12 01:03:44,400 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=12.0 2024-08-12 01:03:46,248 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7400, loss[loss=0.1107, beats_loss=0.01204, ecapa_loss=0.0002006, whisper_loss=0.09669, over 21251.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01117, ecapa_loss=0.0001872, whisper_loss=0.09267, over 3871362.13 frames. ], batch size: 89, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:03:46,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1378320.0, ans=0.125 2024-08-12 01:03:50,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1378320.0, ans=0.125 2024-08-12 01:04:00,384 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-12 01:04:01,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1378420.0, ans=0.125 2024-08-12 01:04:19,829 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 01:04:22,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1378520.0, ans=0.125 2024-08-12 01:04:25,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1378520.0, ans=0.1 2024-08-12 01:04:53,900 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 01:04:54,972 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7450, loss[loss=0.1089, beats_loss=0.01104, ecapa_loss=0.0002097, whisper_loss=0.09581, over 21553.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01115, ecapa_loss=0.0001868, whisper_loss=0.09272, over 3901998.30 frames. ], batch size: 92, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:05:00,931 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 01:05:01,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1378820.0, ans=0.1 2024-08-12 01:05:05,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=1378820.0, ans=0.05 2024-08-12 01:05:13,218 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-12 01:05:26,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1379020.0, ans=0.125 2024-08-12 01:05:33,767 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 01:05:36,046 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.504e+01 2.763e+01 3.240e+01 5.325e+01, threshold=5.527e+01, percent-clipped=0.0 2024-08-12 01:05:39,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1379120.0, ans=0.1 2024-08-12 01:05:45,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1379120.0, ans=0.1 2024-08-12 01:05:45,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1379120.0, ans=0.025 2024-08-12 01:05:52,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1379220.0, ans=0.125 2024-08-12 01:05:54,182 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.73 vs. limit=6.0 2024-08-12 01:06:02,273 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-12 01:06:04,698 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7500, loss[loss=0.09594, beats_loss=0.009848, ecapa_loss=0.000188, whisper_loss=0.08421, over 18019.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01105, ecapa_loss=0.0001876, whisper_loss=0.09314, over 3894569.45 frames. ], batch size: 69, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:06:35,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1379520.0, ans=0.125 2024-08-12 01:06:53,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1379620.0, ans=0.2 2024-08-12 01:06:55,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1379620.0, ans=0.125 2024-08-12 01:06:55,953 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2024-08-12 01:07:05,425 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 01:07:12,615 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 01:07:13,804 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 26 from Vox, 19 fro AS 2024-08-12 01:07:16,624 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7550, loss[loss=0.08647, beats_loss=0.009826, ecapa_loss=0.0002582, whisper_loss=0.07406, over 14164.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01115, ecapa_loss=0.0001885, whisper_loss=0.09154, over 3866469.25 frames. ], batch size: 63, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:07:29,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1379920.0, ans=0.125 2024-08-12 01:07:44,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1380020.0, ans=0.2 2024-08-12 01:07:47,135 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-12 01:07:57,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1380020.0, ans=0.125 2024-08-12 01:07:59,381 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.522e+01 2.796e+01 3.153e+01 8.804e+01, threshold=5.592e+01, percent-clipped=1.0 2024-08-12 01:08:01,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1380120.0, ans=0.1 2024-08-12 01:08:05,488 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-12 01:08:10,084 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 01:08:10,724 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.01 vs. limit=15.0 2024-08-12 01:08:28,724 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7600, loss[loss=0.1181, beats_loss=0.009782, ecapa_loss=0.0001873, whisper_loss=0.1064, over 22877.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01109, ecapa_loss=0.0001883, whisper_loss=0.09126, over 3840694.02 frames. ], batch size: 92, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:08:30,684 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 01:08:40,196 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=22.5 2024-08-12 01:08:55,505 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=15.0 2024-08-12 01:09:02,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1380520.0, ans=0.125 2024-08-12 01:09:06,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1380520.0, ans=0.0 2024-08-12 01:09:07,123 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 01:09:09,647 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 01:09:25,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1380620.0, ans=0.1 2024-08-12 01:09:37,618 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-12 01:09:42,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1380720.0, ans=0.125 2024-08-12 01:09:44,251 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7650, loss[loss=0.1205, beats_loss=0.008677, ecapa_loss=0.0001981, whisper_loss=0.1098, over 22200.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01104, ecapa_loss=0.0001877, whisper_loss=0.09176, over 3859807.26 frames. ], batch size: 87, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:09:53,346 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 01:09:53,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1380820.0, ans=0.0 2024-08-12 01:10:12,896 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-12 01:10:16,315 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 20 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-12 01:10:16,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1381020.0, ans=0.125 2024-08-12 01:10:31,002 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.632e+01 2.933e+01 3.294e+01 6.262e+01, threshold=5.865e+01, percent-clipped=1.0 2024-08-12 01:10:41,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1381120.0, ans=0.125 2024-08-12 01:10:45,905 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 01:10:50,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1381220.0, ans=0.05 2024-08-12 01:10:51,975 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 01:11:02,632 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7700, loss[loss=0.1068, beats_loss=0.01225, ecapa_loss=0.0001629, whisper_loss=0.09295, over 21679.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01104, ecapa_loss=0.0001876, whisper_loss=0.09182, over 3874707.89 frames. ], batch size: 87, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:11:04,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1381320.0, ans=0.125 2024-08-12 01:11:11,646 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 01:11:15,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1381320.0, ans=0.125 2024-08-12 01:11:23,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1381420.0, ans=0.07 2024-08-12 01:11:33,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1381520.0, ans=0.1 2024-08-12 01:11:37,102 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.46 vs. limit=22.5 2024-08-12 01:11:41,344 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 01:11:48,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1381620.0, ans=0.125 2024-08-12 01:11:57,778 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 01:11:59,536 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-12 01:12:10,199 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.04 vs. limit=22.5 2024-08-12 01:12:12,209 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 01:12:15,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1381820.0, ans=0.0 2024-08-12 01:12:16,424 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7750, loss[loss=0.1039, beats_loss=0.01301, ecapa_loss=0.0001505, whisper_loss=0.08934, over 16177.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01104, ecapa_loss=0.0001883, whisper_loss=0.09151, over 3866758.56 frames. ], batch size: 62, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:12:21,789 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.34 vs. limit=12.0 2024-08-12 01:12:41,627 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.95 vs. limit=15.0 2024-08-12 01:12:56,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1382020.0, ans=0.125 2024-08-12 01:13:00,935 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.173e+01 2.543e+01 2.861e+01 3.273e+01 8.260e+01, threshold=5.723e+01, percent-clipped=1.0 2024-08-12 01:13:17,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1382220.0, ans=0.5 2024-08-12 01:13:31,318 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7800, loss[loss=0.1007, beats_loss=0.01374, ecapa_loss=0.0001483, whisper_loss=0.08548, over 22188.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01113, ecapa_loss=0.0001867, whisper_loss=0.09114, over 3876356.86 frames. ], batch size: 90, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:13:33,741 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.90 vs. limit=15.0 2024-08-12 01:13:44,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1382420.0, ans=0.1 2024-08-12 01:13:54,894 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 01:14:30,935 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-12 01:14:40,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1382720.0, ans=0.125 2024-08-12 01:14:45,231 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7850, loss[loss=0.1119, beats_loss=0.008806, ecapa_loss=0.0001935, whisper_loss=0.1011, over 17062.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01112, ecapa_loss=0.0001873, whisper_loss=0.09119, over 3852651.85 frames. ], batch size: 65, lr: 6.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:15:13,252 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 01:15:16,137 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-12 01:15:17,631 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 01:15:25,139 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 01:15:29,416 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.565e+01 2.814e+01 3.165e+01 4.880e+01, threshold=5.628e+01, percent-clipped=0.0 2024-08-12 01:15:30,099 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2024-08-12 01:15:49,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1383220.0, ans=0.1 2024-08-12 01:15:58,346 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7900, loss[loss=0.1159, beats_loss=0.01206, ecapa_loss=0.0001617, whisper_loss=0.1022, over 20629.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01117, ecapa_loss=0.0001875, whisper_loss=0.0918, over 3867804.14 frames. ], batch size: 81, lr: 6.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:15:58,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1383320.0, ans=0.0 2024-08-12 01:16:00,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1383320.0, ans=0.125 2024-08-12 01:16:07,769 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 01:16:12,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1383420.0, ans=0.125 2024-08-12 01:16:15,831 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-12 01:16:36,063 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.46 vs. limit=15.0 2024-08-12 01:16:37,463 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.56 vs. limit=10.0 2024-08-12 01:16:41,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1383520.0, ans=0.2 2024-08-12 01:16:55,393 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 01:17:03,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1383720.0, ans=0.125 2024-08-12 01:17:04,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1383720.0, ans=0.0 2024-08-12 01:17:10,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1383720.0, ans=0.125 2024-08-12 01:17:10,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1383720.0, ans=22.5 2024-08-12 01:17:11,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1383820.0, ans=0.0 2024-08-12 01:17:12,850 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 7950, loss[loss=0.1251, beats_loss=0.01092, ecapa_loss=0.0001638, whisper_loss=0.1125, over 22950.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01117, ecapa_loss=0.0001868, whisper_loss=0.09184, over 3837691.82 frames. ], batch size: 89, lr: 6.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:17:13,005 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-12 01:17:24,039 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 01:17:57,305 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.551e+01 2.931e+01 3.391e+01 6.201e+01, threshold=5.862e+01, percent-clipped=1.0 2024-08-12 01:18:02,264 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-12 01:18:07,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1384120.0, ans=0.1 2024-08-12 01:18:24,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1384220.0, ans=0.07 2024-08-12 01:18:26,634 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8000, loss[loss=0.1191, beats_loss=0.00863, ecapa_loss=0.000215, whisper_loss=0.1083, over 15534.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01112, ecapa_loss=0.0001856, whisper_loss=0.09253, over 3845072.20 frames. ], batch size: 60, lr: 6.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:18:35,802 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 01:18:38,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1384320.0, ans=0.0 2024-08-12 01:18:54,895 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 01:18:56,211 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-12 01:19:01,206 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2024-08-12 01:19:08,626 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-12 01:19:31,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1384720.0, ans=0.125 2024-08-12 01:19:32,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1384720.0, ans=0.125 2024-08-12 01:19:39,244 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8050, loss[loss=0.1107, beats_loss=0.01078, ecapa_loss=0.000189, whisper_loss=0.09804, over 21670.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01109, ecapa_loss=0.0001845, whisper_loss=0.0928, over 3850363.16 frames. ], batch size: 86, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:19:46,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1384820.0, ans=0.125 2024-08-12 01:20:02,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1384920.0, ans=0.0 2024-08-12 01:20:18,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1385020.0, ans=0.2 2024-08-12 01:20:21,932 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 01:20:22,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1385120.0, ans=0.125 2024-08-12 01:20:22,980 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.542e+01 2.903e+01 3.299e+01 4.788e+01, threshold=5.807e+01, percent-clipped=0.0 2024-08-12 01:20:43,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1385220.0, ans=0.2 2024-08-12 01:20:50,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1385320.0, ans=0.0 2024-08-12 01:20:51,532 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8100, loss[loss=0.09251, beats_loss=0.01381, ecapa_loss=0.0001445, whisper_loss=0.07726, over 13596.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01117, ecapa_loss=0.0001847, whisper_loss=0.09221, over 3848141.03 frames. ], batch size: 54, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:20:51,753 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 01:20:53,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1385320.0, ans=0.09899494936611666 2024-08-12 01:20:56,101 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 01:21:00,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1385320.0, ans=0.0 2024-08-12 01:21:06,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1385420.0, ans=0.2 2024-08-12 01:21:07,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1385420.0, ans=0.0 2024-08-12 01:21:15,987 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.34 vs. limit=22.5 2024-08-12 01:21:24,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1385520.0, ans=0.125 2024-08-12 01:21:30,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1385520.0, ans=15.0 2024-08-12 01:21:32,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1385520.0, ans=0.0 2024-08-12 01:21:58,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1385720.0, ans=0.1 2024-08-12 01:21:59,126 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-08-12 01:22:04,120 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8150, loss[loss=0.1046, beats_loss=0.01167, ecapa_loss=0.0001838, whisper_loss=0.09107, over 22669.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01118, ecapa_loss=0.0001845, whisper_loss=0.09219, over 3842771.27 frames. ], batch size: 93, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:22:13,474 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.84 vs. limit=15.0 2024-08-12 01:22:16,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1385820.0, ans=0.0 2024-08-12 01:22:23,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1385920.0, ans=0.125 2024-08-12 01:22:33,083 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-12 01:22:35,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1386020.0, ans=0.125 2024-08-12 01:22:44,969 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 01:22:45,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1386020.0, ans=22.5 2024-08-12 01:22:47,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.599e+01 2.928e+01 3.345e+01 4.607e+01, threshold=5.855e+01, percent-clipped=0.0 2024-08-12 01:22:49,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1386120.0, ans=0.0 2024-08-12 01:22:51,499 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-12 01:23:13,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1386220.0, ans=0.125 2024-08-12 01:23:15,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1386220.0, ans=0.1 2024-08-12 01:23:17,447 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8200, loss[loss=0.1216, beats_loss=0.009338, ecapa_loss=0.0002177, whisper_loss=0.1101, over 18682.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01113, ecapa_loss=0.0001837, whisper_loss=0.09337, over 3898868.56 frames. ], batch size: 75, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:23:28,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=15.0 2024-08-12 01:23:49,792 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.79 vs. limit=15.0 2024-08-12 01:23:51,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1386520.0, ans=0.125 2024-08-12 01:23:52,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1386520.0, ans=0.125 2024-08-12 01:23:54,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1386520.0, ans=0.1 2024-08-12 01:24:11,399 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 24 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 01:24:14,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1386620.0, ans=0.125 2024-08-12 01:24:22,430 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 01:24:23,567 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 01:24:27,565 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-12 01:24:29,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1386720.0, ans=0.125 2024-08-12 01:24:32,314 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8250, loss[loss=0.1266, beats_loss=0.00783, ecapa_loss=0.0001784, whisper_loss=0.117, over 23871.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01105, ecapa_loss=0.0001845, whisper_loss=0.09406, over 3908681.74 frames. ], batch size: 89, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:24:34,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1386820.0, ans=0.125 2024-08-12 01:24:34,392 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2024-08-12 01:24:39,509 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 01:24:43,665 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 01:24:57,603 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.103e+00 2024-08-12 01:25:16,208 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.606e+01 2.891e+01 3.345e+01 5.457e+01, threshold=5.782e+01, percent-clipped=0.0 2024-08-12 01:25:23,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1387120.0, ans=0.015 2024-08-12 01:25:37,856 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 01:25:39,166 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 14 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 01:25:46,388 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8300, loss[loss=0.1042, beats_loss=0.009836, ecapa_loss=0.0001965, whisper_loss=0.09242, over 15088.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01112, ecapa_loss=0.000184, whisper_loss=0.09318, over 3888940.02 frames. ], batch size: 58, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:25:46,599 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 26 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-12 01:26:10,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1387420.0, ans=0.0 2024-08-12 01:26:12,464 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2024-08-12 01:26:13,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1387420.0, ans=0.0 2024-08-12 01:26:19,498 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 01:26:48,517 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 01:26:54,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1387720.0, ans=0.125 2024-08-12 01:26:56,029 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 26 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 01:27:02,077 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8350, loss[loss=0.1156, beats_loss=0.008493, ecapa_loss=0.0002068, whisper_loss=0.1051, over 19503.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01111, ecapa_loss=0.0001849, whisper_loss=0.09281, over 3866037.29 frames. ], batch size: 78, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:27:32,849 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.60 vs. limit=15.0 2024-08-12 01:27:41,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1388020.0, ans=0.1 2024-08-12 01:27:47,169 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.752e+01 3.106e+01 3.684e+01 1.573e+02, threshold=6.213e+01, percent-clipped=3.0 2024-08-12 01:27:49,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1388120.0, ans=0.05 2024-08-12 01:27:52,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1388120.0, ans=0.1 2024-08-12 01:28:02,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2024-08-12 01:28:08,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1388220.0, ans=0.125 2024-08-12 01:28:13,058 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-12 01:28:16,807 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8400, loss[loss=0.1505, beats_loss=0.007349, ecapa_loss=0.0002407, whisper_loss=0.1408, over 18156.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01106, ecapa_loss=0.0001857, whisper_loss=0.0938, over 3874944.02 frames. ], batch size: 72, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:28:52,930 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 01:29:03,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1388620.0, ans=0.0 2024-08-12 01:29:13,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1388620.0, ans=0.125 2024-08-12 01:29:29,432 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8450, loss[loss=0.1047, beats_loss=0.01116, ecapa_loss=0.0002012, whisper_loss=0.09149, over 21422.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01106, ecapa_loss=0.0001856, whisper_loss=0.09331, over 3878023.62 frames. ], batch size: 88, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:29:35,288 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-12 01:29:53,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1388920.0, ans=0.09899494936611666 2024-08-12 01:30:06,938 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-12 01:30:12,256 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+01 2.661e+01 3.023e+01 3.413e+01 6.376e+01, threshold=6.047e+01, percent-clipped=1.0 2024-08-12 01:30:21,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1389120.0, ans=0.1 2024-08-12 01:30:22,773 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 18 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-12 01:30:28,187 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 01:30:29,508 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-12 01:30:39,004 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 01:30:39,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1389320.0, ans=0.125 2024-08-12 01:30:40,154 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8500, loss[loss=0.1154, beats_loss=0.009753, ecapa_loss=0.0002014, whisper_loss=0.1036, over 21195.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01108, ecapa_loss=0.0001861, whisper_loss=0.09273, over 3886221.27 frames. ], batch size: 88, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:30:51,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1389320.0, ans=0.0 2024-08-12 01:30:54,699 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.27 vs. limit=22.5 2024-08-12 01:31:03,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1389420.0, ans=0.0 2024-08-12 01:31:20,542 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 23 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-12 01:31:29,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1389620.0, ans=0.0 2024-08-12 01:31:39,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1389720.0, ans=0.125 2024-08-12 01:31:45,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=1389720.0, ans=15.0 2024-08-12 01:31:52,526 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8550, loss[loss=0.1022, beats_loss=0.01353, ecapa_loss=0.0001461, whisper_loss=0.0872, over 23359.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01109, ecapa_loss=0.0001844, whisper_loss=0.09314, over 3894349.78 frames. ], batch size: 91, lr: 6.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:31:55,693 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 01:31:58,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1389820.0, ans=0.125 2024-08-12 01:32:02,876 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-12 01:32:07,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1389920.0, ans=0.2 2024-08-12 01:32:13,246 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 01:32:20,057 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 29 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 01:32:36,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1390120.0, ans=0.09899494936611666 2024-08-12 01:32:37,394 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.566e+01 2.875e+01 3.249e+01 7.628e+01, threshold=5.750e+01, percent-clipped=1.0 2024-08-12 01:32:40,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1390120.0, ans=0.125 2024-08-12 01:32:43,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1390120.0, ans=0.0 2024-08-12 01:32:49,791 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 01:32:56,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1390220.0, ans=0.1 2024-08-12 01:33:03,809 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8600, loss[loss=0.1039, beats_loss=0.0126, ecapa_loss=0.0001869, whisper_loss=0.08944, over 21688.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01106, ecapa_loss=0.000186, whisper_loss=0.09297, over 3890221.93 frames. ], batch size: 90, lr: 6.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:33:05,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1390320.0, ans=0.125 2024-08-12 01:33:13,824 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 01:33:23,458 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.82 vs. limit=15.0 2024-08-12 01:33:23,857 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 01:33:37,050 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 01:33:38,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1390520.0, ans=0.125 2024-08-12 01:33:45,276 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 01:33:59,233 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.39 vs. limit=15.0 2024-08-12 01:33:59,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1390720.0, ans=0.125 2024-08-12 01:34:04,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1390720.0, ans=0.1 2024-08-12 01:34:14,018 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8650, loss[loss=0.08333, beats_loss=0.01124, ecapa_loss=0.0001828, whisper_loss=0.07026, over 17968.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01101, ecapa_loss=0.0001867, whisper_loss=0.09327, over 3852566.31 frames. ], batch size: 73, lr: 6.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:34:19,286 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2024-08-12 01:34:28,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1390920.0, ans=0.1 2024-08-12 01:34:30,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1390920.0, ans=0.0 2024-08-12 01:34:32,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1390920.0, ans=0.2 2024-08-12 01:34:34,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1390920.0, ans=0.125 2024-08-12 01:34:57,445 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.624e+01 3.118e+01 3.764e+01 6.887e+01, threshold=6.237e+01, percent-clipped=2.0 2024-08-12 01:35:04,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1391120.0, ans=0.125 2024-08-12 01:35:10,408 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 01:35:25,317 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8700, loss[loss=0.1086, beats_loss=0.009679, ecapa_loss=0.0002063, whisper_loss=0.0969, over 23242.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01107, ecapa_loss=0.000188, whisper_loss=0.09297, over 3850055.87 frames. ], batch size: 94, lr: 6.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:35:35,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1391320.0, ans=0.125 2024-08-12 01:35:36,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1391320.0, ans=22.5 2024-08-12 01:35:50,210 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 01:35:59,334 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 01:36:32,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1391720.0, ans=0.0 2024-08-12 01:36:35,138 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 01:36:39,467 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8750, loss[loss=0.1174, beats_loss=0.01006, ecapa_loss=0.0001799, whisper_loss=0.1056, over 23493.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01101, ecapa_loss=0.0001873, whisper_loss=0.09374, over 3853939.66 frames. ], batch size: 93, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:36:41,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1391820.0, ans=0.5 2024-08-12 01:36:41,733 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=15.0 2024-08-12 01:36:47,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1391820.0, ans=0.05 2024-08-12 01:37:00,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1391920.0, ans=0.2 2024-08-12 01:37:05,940 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.17 vs. limit=22.5 2024-08-12 01:37:24,519 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 01:37:25,928 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 2.651e+01 2.928e+01 3.365e+01 6.201e+01, threshold=5.855e+01, percent-clipped=0.0 2024-08-12 01:37:37,582 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-12 01:37:54,019 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8800, loss[loss=0.07163, beats_loss=0.01444, ecapa_loss=0.0002013, whisper_loss=0.05518, over 21103.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01099, ecapa_loss=0.0001869, whisper_loss=0.09354, over 3850208.11 frames. ], batch size: 91, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:37:54,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1392320.0, ans=0.125 2024-08-12 01:37:55,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1392320.0, ans=0.0 2024-08-12 01:38:02,987 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 35 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 01:38:10,894 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 01:38:54,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1392720.0, ans=0.0 2024-08-12 01:38:58,436 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 01:39:03,517 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 01:39:08,893 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8850, loss[loss=0.1027, beats_loss=0.01283, ecapa_loss=0.000122, whisper_loss=0.08864, over 18289.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01101, ecapa_loss=0.0001859, whisper_loss=0.09342, over 3854651.28 frames. ], batch size: 69, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:39:11,190 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=12.0 2024-08-12 01:39:24,664 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.91 vs. limit=15.0 2024-08-12 01:39:27,926 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 01:39:52,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1393120.0, ans=0.2 2024-08-12 01:39:53,361 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 2.605e+01 2.898e+01 3.315e+01 6.590e+01, threshold=5.796e+01, percent-clipped=1.0 2024-08-12 01:39:57,919 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 01:39:59,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1393120.0, ans=0.125 2024-08-12 01:40:02,256 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.328e-02 2024-08-12 01:40:06,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1393220.0, ans=0.0 2024-08-12 01:40:10,037 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2024-08-12 01:40:16,438 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 01:40:20,373 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8900, loss[loss=0.103, beats_loss=0.01165, ecapa_loss=0.0001856, whisper_loss=0.08945, over 21876.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01106, ecapa_loss=0.000184, whisper_loss=0.09296, over 3842101.00 frames. ], batch size: 88, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:40:22,112 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 01:40:33,545 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 01:40:39,153 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-12 01:40:39,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1393420.0, ans=0.125 2024-08-12 01:40:55,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1393520.0, ans=0.125 2024-08-12 01:40:58,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1393520.0, ans=0.125 2024-08-12 01:40:59,347 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=15.0 2024-08-12 01:41:04,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1393620.0, ans=0.95 2024-08-12 01:41:05,793 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 01:41:07,747 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2024-08-12 01:41:15,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1393720.0, ans=0.125 2024-08-12 01:41:25,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1393720.0, ans=0.1 2024-08-12 01:41:27,114 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 36 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 01:41:31,040 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 8950, loss[loss=0.1206, beats_loss=0.008038, ecapa_loss=0.0001714, whisper_loss=0.1108, over 19247.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01104, ecapa_loss=0.000184, whisper_loss=0.09375, over 3835316.89 frames. ], batch size: 74, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:41:33,156 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.57 vs. limit=10.0 2024-08-12 01:41:35,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1393820.0, ans=0.125 2024-08-12 01:41:49,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1393920.0, ans=0.125 2024-08-12 01:41:56,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1394020.0, ans=0.125 2024-08-12 01:42:13,562 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.694e+01 3.111e+01 3.699e+01 1.037e+02, threshold=6.222e+01, percent-clipped=1.0 2024-08-12 01:42:13,748 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 01:42:15,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1394120.0, ans=0.0 2024-08-12 01:42:21,599 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 01:42:37,944 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 26 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-12 01:42:38,975 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9000, loss[loss=0.09577, beats_loss=0.01161, ecapa_loss=0.0002149, whisper_loss=0.08202, over 21792.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01099, ecapa_loss=0.0001844, whisper_loss=0.09409, over 3900362.89 frames. ], batch size: 95, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:42:38,976 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 01:42:54,463 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.0940, 2.6513, 4.5178, 4.3707], device='cuda:2') 2024-08-12 01:43:16,629 INFO [train_multi_KD3.py:1149] (2/4) Epoch 10, validation on ASR_libri: loss=0.2567, beats_loss=0, ecapa_loss=0.0006076, whisper_loss=0.2507, over 922467.00 frames. 2024-08-12 01:43:34,728 INFO [train_multi_KD3.py:1149] (2/4) Epoch 10, validation on SV_voxceleb1: loss=0.005114, beats_loss=0, ecapa_loss=0.0005114, whisper_loss=0, over 939242.00 frames. 2024-08-12 01:44:35,671 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.4709, 1.7219, 1.8675, 1.0926], device='cuda:2') 2024-08-12 01:45:19,166 INFO [train_multi_KD3.py:1149] (2/4) Epoch 10, validation on AT_audioset: loss=0.02463, beats_loss=0.02463, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 01:45:19,176 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 01:45:38,428 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 01:45:45,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1394520.0, ans=0.125 2024-08-12 01:45:51,373 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-12 01:46:02,542 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-12 01:46:08,858 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=12.0 2024-08-12 01:46:26,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1394720.0, ans=0.1 2024-08-12 01:46:26,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1394720.0, ans=0.2 2024-08-12 01:46:28,853 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9050, loss[loss=0.1179, beats_loss=0.01127, ecapa_loss=0.0002079, whisper_loss=0.1046, over 22004.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.011, ecapa_loss=0.0001849, whisper_loss=0.09383, over 3898591.97 frames. ], batch size: 91, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:46:37,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1394820.0, ans=0.125 2024-08-12 01:46:45,806 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 01:47:02,490 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 17 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-12 01:47:11,891 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.578e+01 2.935e+01 3.281e+01 5.128e+01, threshold=5.870e+01, percent-clipped=0.0 2024-08-12 01:47:13,496 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-12 01:47:37,888 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9100, loss[loss=0.1204, beats_loss=0.01127, ecapa_loss=0.0001786, whisper_loss=0.1073, over 22739.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01101, ecapa_loss=0.000184, whisper_loss=0.09385, over 3871958.83 frames. ], batch size: 89, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:47:53,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1395420.0, ans=0.125 2024-08-12 01:48:04,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1395520.0, ans=0.125 2024-08-12 01:48:12,260 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 01:48:15,786 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=15.0 2024-08-12 01:48:21,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1395620.0, ans=0.1 2024-08-12 01:48:25,308 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 01:48:36,180 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.36 vs. limit=15.0 2024-08-12 01:48:45,534 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9150, loss[loss=0.09872, beats_loss=0.01262, ecapa_loss=0.0002265, whisper_loss=0.08383, over 20202.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01111, ecapa_loss=0.0001843, whisper_loss=0.0935, over 3865321.76 frames. ], batch size: 88, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:48:50,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1395820.0, ans=0.2 2024-08-12 01:49:00,914 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-12 01:49:18,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1396020.0, ans=0.125 2024-08-12 01:49:25,155 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2024-08-12 01:49:28,442 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.582e+01 2.877e+01 3.376e+01 5.392e+01, threshold=5.754e+01, percent-clipped=0.0 2024-08-12 01:49:43,360 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 01:49:44,854 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.544e-02 2024-08-12 01:49:51,192 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 01:49:51,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1396220.0, ans=0.125 2024-08-12 01:49:53,980 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9200, loss[loss=0.1063, beats_loss=0.0119, ecapa_loss=0.0001665, whisper_loss=0.09278, over 21844.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01106, ecapa_loss=0.0001846, whisper_loss=0.09397, over 3874972.67 frames. ], batch size: 85, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:49:54,854 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2024-08-12 01:50:02,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2024-08-12 01:50:31,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1396520.0, ans=0.125 2024-08-12 01:50:34,375 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 01:50:36,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1396620.0, ans=0.5 2024-08-12 01:50:48,358 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.80 vs. limit=15.0 2024-08-12 01:51:02,549 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9250, loss[loss=0.1148, beats_loss=0.01254, ecapa_loss=0.000149, whisper_loss=0.1008, over 22646.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01115, ecapa_loss=0.0001849, whisper_loss=0.09319, over 3882147.28 frames. ], batch size: 89, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:51:10,172 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.66 vs. limit=22.5 2024-08-12 01:51:16,666 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.84 vs. limit=6.0 2024-08-12 01:51:23,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1396920.0, ans=0.04949747468305833 2024-08-12 01:51:29,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1397020.0, ans=0.125 2024-08-12 01:51:30,724 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 01:51:30,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1397020.0, ans=0.2 2024-08-12 01:51:37,616 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 01:51:44,110 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.700e+01 2.936e+01 3.310e+01 8.820e+01, threshold=5.872e+01, percent-clipped=1.0 2024-08-12 01:51:47,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1397120.0, ans=0.125 2024-08-12 01:51:48,793 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 01:51:49,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1397120.0, ans=10.0 2024-08-12 01:52:02,340 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-12 01:52:08,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1397220.0, ans=0.125 2024-08-12 01:52:10,210 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9300, loss[loss=0.09016, beats_loss=0.01054, ecapa_loss=0.0002156, whisper_loss=0.07747, over 16053.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01116, ecapa_loss=0.0001846, whisper_loss=0.09257, over 3882393.81 frames. ], batch size: 63, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:52:44,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1397520.0, ans=0.125 2024-08-12 01:53:02,233 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=15.0 2024-08-12 01:53:19,477 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9350, loss[loss=0.08636, beats_loss=0.01138, ecapa_loss=0.0001867, whisper_loss=0.07311, over 15913.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01112, ecapa_loss=0.000185, whisper_loss=0.09269, over 3867046.02 frames. ], batch size: 64, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:53:24,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1397820.0, ans=0.125 2024-08-12 01:53:29,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1397820.0, ans=0.0 2024-08-12 01:53:29,942 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2024-08-12 01:53:32,135 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 01:53:39,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1397920.0, ans=0.1 2024-08-12 01:54:02,928 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.050e+01 2.487e+01 2.851e+01 3.233e+01 4.318e+01, threshold=5.702e+01, percent-clipped=0.0 2024-08-12 01:54:19,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1398220.0, ans=0.125 2024-08-12 01:54:29,331 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9400, loss[loss=0.09991, beats_loss=0.01116, ecapa_loss=0.0001666, whisper_loss=0.08708, over 22072.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01122, ecapa_loss=0.000185, whisper_loss=0.0919, over 3863373.54 frames. ], batch size: 94, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:54:30,847 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 01:54:41,926 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 21 from LS+wenet, 26 from Vox, 47 fro AS 2024-08-12 01:54:44,568 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-12 01:55:01,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1398520.0, ans=0.0 2024-08-12 01:55:16,095 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.48 vs. limit=15.0 2024-08-12 01:55:24,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1398720.0, ans=0.0 2024-08-12 01:55:38,122 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9450, loss[loss=0.07704, beats_loss=0.01456, ecapa_loss=0.000179, whisper_loss=0.06069, over 18104.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01126, ecapa_loss=0.0001847, whisper_loss=0.0914, over 3840342.93 frames. ], batch size: 77, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:55:54,046 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.42 vs. limit=10.0 2024-08-12 01:56:04,131 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 01:56:05,661 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 01:56:05,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1399020.0, ans=0.025 2024-08-12 01:56:06,878 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 01:56:08,123 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-12 01:56:12,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1399020.0, ans=0.0 2024-08-12 01:56:20,509 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.626e+01 2.954e+01 3.375e+01 5.231e+01, threshold=5.908e+01, percent-clipped=0.0 2024-08-12 01:56:26,282 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 01:56:27,658 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-12 01:56:27,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1399120.0, ans=0.0 2024-08-12 01:56:34,630 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 01:56:34,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1399220.0, ans=0.0 2024-08-12 01:56:40,098 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 13 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 01:56:41,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1399220.0, ans=0.0 2024-08-12 01:56:46,564 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9500, loss[loss=0.108, beats_loss=0.01106, ecapa_loss=0.0001892, whisper_loss=0.09502, over 22600.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01127, ecapa_loss=0.0001852, whisper_loss=0.09043, over 3803329.91 frames. ], batch size: 90, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:57:10,711 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 01:57:13,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1399520.0, ans=0.125 2024-08-12 01:57:17,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1399520.0, ans=0.0 2024-08-12 01:57:32,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1399620.0, ans=0.1 2024-08-12 01:57:34,590 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 01:57:49,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1399720.0, ans=0.1 2024-08-12 01:57:51,544 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=22.5 2024-08-12 01:57:56,082 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9550, loss[loss=0.08734, beats_loss=0.01236, ecapa_loss=0.0002022, whisper_loss=0.07296, over 21525.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01117, ecapa_loss=0.0001853, whisper_loss=0.09122, over 3824570.08 frames. ], batch size: 89, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:57:57,729 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 01:58:00,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1399820.0, ans=0.125 2024-08-12 01:58:03,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1399820.0, ans=0.2 2024-08-12 01:58:40,664 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.623e+01 2.882e+01 3.186e+01 4.825e+01, threshold=5.764e+01, percent-clipped=0.0 2024-08-12 01:58:41,405 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-12 01:59:03,218 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 01:59:04,536 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 01:59:04,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1400220.0, ans=0.0 2024-08-12 01:59:06,741 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9600, loss[loss=0.09108, beats_loss=0.01092, ecapa_loss=0.0002408, whisper_loss=0.07775, over 20845.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01108, ecapa_loss=0.0001851, whisper_loss=0.09151, over 3829215.18 frames. ], batch size: 90, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:59:18,827 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 01:59:19,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1400320.0, ans=0.125 2024-08-12 01:59:19,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1400320.0, ans=0.2 2024-08-12 01:59:23,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1400420.0, ans=0.07 2024-08-12 01:59:28,444 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 01:59:28,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1400420.0, ans=0.125 2024-08-12 01:59:37,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1400520.0, ans=0.1 2024-08-12 01:59:42,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1400520.0, ans=0.0 2024-08-12 01:59:56,326 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 13 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-12 02:00:09,240 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 26 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 02:00:16,874 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9650, loss[loss=0.135, beats_loss=0.009778, ecapa_loss=0.0001844, whisper_loss=0.1234, over 16951.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01111, ecapa_loss=0.0001843, whisper_loss=0.09179, over 3826990.26 frames. ], batch size: 64, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:00:26,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1400820.0, ans=0.0 2024-08-12 02:00:39,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1400920.0, ans=0.0 2024-08-12 02:00:53,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1401020.0, ans=0.125 2024-08-12 02:01:00,094 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.704e+01 3.034e+01 3.483e+01 7.919e+01, threshold=6.068e+01, percent-clipped=1.0 2024-08-12 02:01:00,228 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 02:01:16,163 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=15.0 2024-08-12 02:01:18,972 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.93 vs. limit=15.0 2024-08-12 02:01:26,571 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9700, loss[loss=0.08798, beats_loss=0.01346, ecapa_loss=0.0002243, whisper_loss=0.07227, over 21772.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0111, ecapa_loss=0.0001867, whisper_loss=0.09156, over 3808220.60 frames. ], batch size: 94, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:01:30,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1401320.0, ans=0.0 2024-08-12 02:01:36,306 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 33 from LS+wenet, 25 from Vox, 19 fro AS 2024-08-12 02:01:39,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1401420.0, ans=0.0 2024-08-12 02:01:44,683 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2024-08-12 02:02:02,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1401520.0, ans=0.0 2024-08-12 02:02:03,531 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.30 vs. limit=15.0 2024-08-12 02:02:17,717 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2024-08-12 02:02:17,748 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.08 vs. limit=10.0 2024-08-12 02:02:23,996 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 02:02:36,525 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2024-08-12 02:02:36,880 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9750, loss[loss=0.127, beats_loss=0.005876, ecapa_loss=0.0002066, whisper_loss=0.1191, over 17966.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01108, ecapa_loss=0.0001872, whisper_loss=0.09142, over 3787513.89 frames. ], batch size: 65, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:02:43,291 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.088e+01 2024-08-12 02:03:06,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1402020.0, ans=10.0 2024-08-12 02:03:07,786 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2024-08-12 02:03:20,684 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 2.664e+01 3.101e+01 3.565e+01 5.192e+01, threshold=6.201e+01, percent-clipped=0.0 2024-08-12 02:03:33,805 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 02:03:34,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1402220.0, ans=0.125 2024-08-12 02:03:47,778 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9800, loss[loss=0.1055, beats_loss=0.01234, ecapa_loss=0.0002053, whisper_loss=0.09114, over 21880.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01114, ecapa_loss=0.0001883, whisper_loss=0.09131, over 3789977.79 frames. ], batch size: 94, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:03:54,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1402320.0, ans=0.125 2024-08-12 02:04:21,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1402520.0, ans=0.125 2024-08-12 02:04:23,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=1402520.0, ans=0.1 2024-08-12 02:04:33,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1402620.0, ans=0.125 2024-08-12 02:04:35,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1402620.0, ans=0.0 2024-08-12 02:04:45,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1402720.0, ans=0.125 2024-08-12 02:04:55,035 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 02:04:58,866 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9850, loss[loss=0.09264, beats_loss=0.01437, ecapa_loss=0.0001658, whisper_loss=0.07661, over 22666.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01112, ecapa_loss=0.0001881, whisper_loss=0.09201, over 3805708.54 frames. ], batch size: 93, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:05:07,056 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 02:05:14,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1402920.0, ans=0.1 2024-08-12 02:05:17,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1402920.0, ans=0.1 2024-08-12 02:05:42,035 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.518e+01 2.832e+01 3.271e+01 6.017e+01, threshold=5.663e+01, percent-clipped=0.0 2024-08-12 02:05:49,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1403120.0, ans=0.2 2024-08-12 02:05:50,341 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.49 vs. limit=15.0 2024-08-12 02:06:01,158 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-12 02:06:04,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1403220.0, ans=0.0 2024-08-12 02:06:09,022 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9900, loss[loss=0.1193, beats_loss=0.008108, ecapa_loss=0.0002008, whisper_loss=0.1092, over 18444.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01107, ecapa_loss=0.0001875, whisper_loss=0.09257, over 3830280.97 frames. ], batch size: 74, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:06:12,358 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 32 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 02:06:31,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1403420.0, ans=0.05 2024-08-12 02:06:39,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1403520.0, ans=0.0 2024-08-12 02:06:46,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1403520.0, ans=0.0 2024-08-12 02:06:48,712 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.94 vs. limit=15.0 2024-08-12 02:06:54,242 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=12.0 2024-08-12 02:06:58,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1403620.0, ans=0.0 2024-08-12 02:06:59,343 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 18 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-12 02:07:03,170 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 20 from LS+wenet, 17 from Vox, 52 fro AS 2024-08-12 02:07:03,976 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.39 vs. limit=15.0 2024-08-12 02:07:19,998 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 9950, loss[loss=0.1164, beats_loss=0.009325, ecapa_loss=0.0002075, whisper_loss=0.105, over 19191.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01113, ecapa_loss=0.0001881, whisper_loss=0.09281, over 3866635.03 frames. ], batch size: 71, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:07:21,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1403820.0, ans=0.0 2024-08-12 02:07:35,044 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-12 02:07:48,431 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 02:08:03,747 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.549e+01 2.857e+01 3.293e+01 8.751e+01, threshold=5.714e+01, percent-clipped=2.0 2024-08-12 02:08:04,077 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-12 02:08:24,949 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-12 02:08:29,933 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10000, loss[loss=0.08964, beats_loss=0.01204, ecapa_loss=0.0002169, whisper_loss=0.07543, over 21597.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01116, ecapa_loss=0.0001865, whisper_loss=0.09277, over 3875213.88 frames. ], batch size: 92, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:08:40,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1404320.0, ans=0.04949747468305833 2024-08-12 02:08:51,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1404420.0, ans=0.0 2024-08-12 02:08:52,602 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 20 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-12 02:09:13,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1404620.0, ans=0.2 2024-08-12 02:09:25,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1404620.0, ans=0.2 2024-08-12 02:09:43,347 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 27 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-12 02:09:44,463 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10050, loss[loss=0.1114, beats_loss=0.008319, ecapa_loss=0.0002198, whisper_loss=0.1009, over 15936.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.0111, ecapa_loss=0.0001862, whisper_loss=0.09286, over 3857070.10 frames. ], batch size: 64, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:09:54,475 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 02:09:57,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1404920.0, ans=0.025 2024-08-12 02:10:08,095 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 02:10:20,302 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.431e+02 2024-08-12 02:10:30,645 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.648e+01 2.983e+01 3.418e+01 4.523e+01, threshold=5.967e+01, percent-clipped=0.0 2024-08-12 02:10:36,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1405120.0, ans=0.125 2024-08-12 02:10:39,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1405120.0, ans=0.125 2024-08-12 02:10:39,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1405120.0, ans=0.0 2024-08-12 02:10:47,561 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-12 02:11:01,653 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 02:11:02,979 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10100, loss[loss=0.1192, beats_loss=0.01056, ecapa_loss=0.0001642, whisper_loss=0.107, over 15042.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01111, ecapa_loss=0.0001858, whisper_loss=0.09321, over 3874569.73 frames. ], batch size: 57, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:11:38,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1405520.0, ans=0.125 2024-08-12 02:11:54,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1405620.0, ans=0.1 2024-08-12 02:12:17,580 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 15 from LS+wenet, 27 from Vox, 47 fro AS 2024-08-12 02:12:27,043 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10150, loss[loss=0.1119, beats_loss=0.009669, ecapa_loss=0.0001904, whisper_loss=0.1004, over 15422.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01118, ecapa_loss=0.0001867, whisper_loss=0.09244, over 3902733.21 frames. ], batch size: 61, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:12:27,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1405820.0, ans=0.2 2024-08-12 02:12:31,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1405820.0, ans=0.0 2024-08-12 02:12:41,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1405920.0, ans=0.125 2024-08-12 02:12:48,027 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 02:13:00,185 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-12 02:13:04,677 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 02:13:14,516 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 02:13:23,339 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.579e+01 2.918e+01 3.241e+01 4.906e+01, threshold=5.836e+01, percent-clipped=0.0 2024-08-12 02:13:36,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1406120.0, ans=0.1 2024-08-12 02:13:45,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1406220.0, ans=0.125 2024-08-12 02:14:07,864 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10200, loss[loss=0.1115, beats_loss=0.01178, ecapa_loss=0.0002286, whisper_loss=0.09741, over 22007.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01112, ecapa_loss=0.0001864, whisper_loss=0.09293, over 3879674.64 frames. ], batch size: 95, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:14:14,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1406320.0, ans=0.04949747468305833 2024-08-12 02:14:18,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1406320.0, ans=0.0 2024-08-12 02:14:23,642 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 18 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 02:15:08,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1406520.0, ans=0.1 2024-08-12 02:15:24,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1406620.0, ans=0.0 2024-08-12 02:16:01,462 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10250, loss[loss=0.09264, beats_loss=0.0116, ecapa_loss=0.0001688, whisper_loss=0.07935, over 18844.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01108, ecapa_loss=0.0001842, whisper_loss=0.0932, over 3889016.41 frames. ], batch size: 71, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:16:04,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1406820.0, ans=0.125 2024-08-12 02:16:31,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1406920.0, ans=0.0 2024-08-12 02:16:51,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1407020.0, ans=0.125 2024-08-12 02:17:04,102 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.107e+01 2.647e+01 2.891e+01 3.478e+01 5.936e+01, threshold=5.783e+01, percent-clipped=1.0 2024-08-12 02:17:07,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1407120.0, ans=0.07 2024-08-12 02:17:29,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1407220.0, ans=0.0 2024-08-12 02:17:36,522 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=12.0 2024-08-12 02:17:43,205 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10300, loss[loss=0.09068, beats_loss=0.01235, ecapa_loss=0.0001947, whisper_loss=0.07638, over 15201.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01111, ecapa_loss=0.000185, whisper_loss=0.09209, over 3867949.59 frames. ], batch size: 62, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:17:54,168 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 02:17:55,711 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 02:18:10,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1407420.0, ans=0.05 2024-08-12 02:18:15,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1407420.0, ans=0.1 2024-08-12 02:18:32,679 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-12 02:18:40,987 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-12 02:18:46,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1407620.0, ans=0.125 2024-08-12 02:18:48,739 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 02:18:55,959 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.594e-03 2024-08-12 02:18:58,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1407720.0, ans=0.1 2024-08-12 02:19:01,672 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 21 from LS+wenet, 21 from Vox, 50 fro AS 2024-08-12 02:19:02,963 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 30 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-12 02:19:04,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1407720.0, ans=0.125 2024-08-12 02:19:09,423 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-08-12 02:19:11,396 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10350, loss[loss=0.0783, beats_loss=0.01191, ecapa_loss=0.0001805, whisper_loss=0.06459, over 20728.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01106, ecapa_loss=0.0001855, whisper_loss=0.09285, over 3892995.74 frames. ], batch size: 87, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:19:12,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1407820.0, ans=0.125 2024-08-12 02:19:56,147 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.600e+01 2.842e+01 3.107e+01 4.520e+01, threshold=5.684e+01, percent-clipped=0.0 2024-08-12 02:20:06,682 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 02:20:13,139 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 02:20:25,215 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10400, loss[loss=0.08829, beats_loss=0.01119, ecapa_loss=0.000165, whisper_loss=0.07545, over 17175.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01107, ecapa_loss=0.0001855, whisper_loss=0.09238, over 3884742.47 frames. ], batch size: 67, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:20:36,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1408320.0, ans=0.05 2024-08-12 02:20:48,225 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2024-08-12 02:20:52,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1408420.0, ans=0.1 2024-08-12 02:21:02,573 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.92 vs. limit=6.0 2024-08-12 02:21:06,626 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 02:21:15,605 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.450e+01 2024-08-12 02:21:35,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1408720.0, ans=0.0 2024-08-12 02:21:37,536 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10450, loss[loss=0.1056, beats_loss=0.01175, ecapa_loss=0.0001644, whisper_loss=0.0922, over 22146.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01106, ecapa_loss=0.0001849, whisper_loss=0.09298, over 3898566.45 frames. ], batch size: 88, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:21:44,314 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-12 02:22:02,073 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 02:22:06,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1409020.0, ans=0.1 2024-08-12 02:22:08,424 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.92 vs. limit=15.0 2024-08-12 02:22:14,542 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.07 vs. limit=8.0 2024-08-12 02:22:21,862 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.627e+01 2.925e+01 3.348e+01 4.455e+01, threshold=5.851e+01, percent-clipped=0.0 2024-08-12 02:22:24,015 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.76 vs. limit=6.0 2024-08-12 02:22:30,439 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 02:22:49,580 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10500, loss[loss=0.101, beats_loss=0.008791, ecapa_loss=0.0002032, whisper_loss=0.0902, over 15709.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01109, ecapa_loss=0.0001851, whisper_loss=0.09246, over 3892112.11 frames. ], batch size: 63, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:22:53,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1409320.0, ans=0.0 2024-08-12 02:23:07,303 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 02:23:16,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1409420.0, ans=0.1 2024-08-12 02:23:18,935 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 29 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-12 02:23:24,502 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 02:23:37,607 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 12 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 02:23:46,790 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-12 02:23:48,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1409720.0, ans=0.07 2024-08-12 02:23:52,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1409720.0, ans=0.1 2024-08-12 02:24:02,580 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10550, loss[loss=0.08464, beats_loss=0.01002, ecapa_loss=0.0002219, whisper_loss=0.0724, over 16564.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01108, ecapa_loss=0.0001847, whisper_loss=0.09241, over 3876679.43 frames. ], batch size: 68, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:24:11,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1409820.0, ans=0.125 2024-08-12 02:24:17,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1409920.0, ans=0.2 2024-08-12 02:24:28,252 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 02:24:46,342 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.599e+01 2.845e+01 3.296e+01 6.744e+01, threshold=5.691e+01, percent-clipped=1.0 2024-08-12 02:24:49,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1410120.0, ans=0.0 2024-08-12 02:24:52,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=22.5 2024-08-12 02:25:03,704 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-12 02:25:13,068 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10600, loss[loss=0.07958, beats_loss=0.01395, ecapa_loss=0.0001658, whisper_loss=0.06398, over 15759.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01101, ecapa_loss=0.0001864, whisper_loss=0.09253, over 3826086.30 frames. ], batch size: 61, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:25:13,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1410320.0, ans=0.07 2024-08-12 02:25:20,702 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=15.0 2024-08-12 02:25:29,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1410420.0, ans=22.5 2024-08-12 02:25:49,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1410520.0, ans=0.05 2024-08-12 02:25:59,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.30 vs. limit=15.0 2024-08-12 02:26:16,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1410720.0, ans=0.125 2024-08-12 02:26:22,465 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10650, loss[loss=0.0751, beats_loss=0.01356, ecapa_loss=0.0001763, whisper_loss=0.05977, over 14062.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.011, ecapa_loss=0.0001852, whisper_loss=0.09289, over 3832151.91 frames. ], batch size: 58, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:26:24,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1410820.0, ans=0.125 2024-08-12 02:26:28,050 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 02:26:30,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1410820.0, ans=0.125 2024-08-12 02:26:38,703 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 02:26:43,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=15.0 2024-08-12 02:26:43,560 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.72 vs. limit=15.0 2024-08-12 02:26:45,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1410920.0, ans=0.0 2024-08-12 02:26:48,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1411020.0, ans=0.125 2024-08-12 02:27:00,441 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 02:27:03,868 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-08-12 02:27:04,488 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.646e+01 2.959e+01 3.392e+01 4.637e+01, threshold=5.918e+01, percent-clipped=0.0 2024-08-12 02:27:06,077 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 02:27:07,520 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 02:27:11,131 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 02:27:20,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1411220.0, ans=0.125 2024-08-12 02:27:21,343 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 02:27:25,507 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 02:27:30,810 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10700, loss[loss=0.1125, beats_loss=0.01083, ecapa_loss=0.0001685, whisper_loss=0.1, over 22073.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01106, ecapa_loss=0.0001842, whisper_loss=0.09325, over 3884464.63 frames. ], batch size: 87, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:27:37,851 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 02:27:40,682 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 02:27:41,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1411320.0, ans=0.0 2024-08-12 02:27:47,941 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 02:27:52,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1411420.0, ans=0.2 2024-08-12 02:27:53,661 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 02:28:03,429 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-12 02:28:13,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1411620.0, ans=0.05 2024-08-12 02:28:17,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1411620.0, ans=0.125 2024-08-12 02:28:40,077 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10750, loss[loss=0.1105, beats_loss=0.009098, ecapa_loss=0.0001775, whisper_loss=0.09959, over 16810.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01105, ecapa_loss=0.0001835, whisper_loss=0.09339, over 3862336.79 frames. ], batch size: 66, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:28:44,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1411820.0, ans=0.04949747468305833 2024-08-12 02:28:56,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1411920.0, ans=0.125 2024-08-12 02:28:57,835 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 02:29:02,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1411920.0, ans=0.125 2024-08-12 02:29:11,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1412020.0, ans=10.0 2024-08-12 02:29:14,358 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-12 02:29:22,716 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.596e+01 2.921e+01 3.440e+01 9.548e+01, threshold=5.843e+01, percent-clipped=1.0 2024-08-12 02:29:30,268 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 02:29:42,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1412220.0, ans=0.0 2024-08-12 02:29:46,352 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 02:29:48,779 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10800, loss[loss=0.1183, beats_loss=0.009479, ecapa_loss=0.0001708, whisper_loss=0.1071, over 21859.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01106, ecapa_loss=0.0001834, whisper_loss=0.09349, over 3871373.65 frames. ], batch size: 83, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:29:53,077 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 02:29:54,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1412320.0, ans=0.125 2024-08-12 02:30:04,069 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-12 02:30:12,175 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 02:30:13,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1412420.0, ans=0.035 2024-08-12 02:30:18,049 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2024-08-12 02:30:20,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1412520.0, ans=0.125 2024-08-12 02:30:24,488 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 02:30:36,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1412620.0, ans=0.125 2024-08-12 02:30:56,334 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10850, loss[loss=0.07086, beats_loss=0.01047, ecapa_loss=0.0001931, whisper_loss=0.05845, over 13958.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01109, ecapa_loss=0.0001832, whisper_loss=0.09324, over 3878464.77 frames. ], batch size: 57, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:30:58,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1412820.0, ans=0.0 2024-08-12 02:31:07,999 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 02:31:25,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1413020.0, ans=0.125 2024-08-12 02:31:35,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1413020.0, ans=0.1 2024-08-12 02:31:39,227 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.032e+01 2.708e+01 3.088e+01 3.544e+01 8.247e+01, threshold=6.177e+01, percent-clipped=2.0 2024-08-12 02:31:42,518 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 14 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 02:31:49,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1413120.0, ans=0.0 2024-08-12 02:32:05,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1413320.0, ans=15.0 2024-08-12 02:32:06,803 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10900, loss[loss=0.1138, beats_loss=0.01029, ecapa_loss=0.000157, whisper_loss=0.1019, over 17384.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01112, ecapa_loss=0.0001836, whisper_loss=0.09288, over 3879828.71 frames. ], batch size: 63, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:32:17,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1413320.0, ans=0.2 2024-08-12 02:32:24,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1413420.0, ans=0.0 2024-08-12 02:32:43,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1413520.0, ans=0.125 2024-08-12 02:32:47,505 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 02:32:58,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1413620.0, ans=0.2 2024-08-12 02:33:01,176 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-08-12 02:33:12,854 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-12 02:33:13,432 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.29 vs. limit=12.0 2024-08-12 02:33:16,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1413720.0, ans=0.2 2024-08-12 02:33:18,379 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 10950, loss[loss=0.1117, beats_loss=0.009664, ecapa_loss=0.0002163, whisper_loss=0.09986, over 21597.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01104, ecapa_loss=0.0001836, whisper_loss=0.09313, over 3877627.70 frames. ], batch size: 87, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:33:23,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1413820.0, ans=0.125 2024-08-12 02:33:33,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1413920.0, ans=0.1 2024-08-12 02:33:40,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1413920.0, ans=0.125 2024-08-12 02:33:42,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1413920.0, ans=0.125 2024-08-12 02:33:46,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1414020.0, ans=0.0 2024-08-12 02:33:48,697 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 16 from Vox, 52 fro AS 2024-08-12 02:33:50,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1414020.0, ans=0.2 2024-08-12 02:34:00,876 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.632e+01 3.025e+01 3.424e+01 7.059e+01, threshold=6.051e+01, percent-clipped=1.0 2024-08-12 02:34:15,167 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 12 from Vox, 44 fro AS 2024-08-12 02:34:18,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1414220.0, ans=0.125 2024-08-12 02:34:27,530 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11000, loss[loss=0.09559, beats_loss=0.01231, ecapa_loss=0.000191, whisper_loss=0.08137, over 22427.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01096, ecapa_loss=0.0001857, whisper_loss=0.09333, over 3864130.69 frames. ], batch size: 93, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:34:33,459 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 02:34:34,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1414320.0, ans=0.125 2024-08-12 02:34:42,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1414420.0, ans=0.125 2024-08-12 02:34:56,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1414520.0, ans=0.2 2024-08-12 02:35:27,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1414720.0, ans=0.07 2024-08-12 02:35:35,794 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11050, loss[loss=0.08006, beats_loss=0.009428, ecapa_loss=0.0001821, whisper_loss=0.06882, over 14925.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01093, ecapa_loss=0.0001858, whisper_loss=0.09347, over 3896266.07 frames. ], batch size: 58, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:35:49,299 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-12 02:36:18,457 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.531e+01 2.878e+01 3.285e+01 6.916e+01, threshold=5.755e+01, percent-clipped=1.0 2024-08-12 02:36:20,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1415120.0, ans=0.1 2024-08-12 02:36:38,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1415220.0, ans=0.04949747468305833 2024-08-12 02:36:45,015 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11100, loss[loss=0.1058, beats_loss=0.01041, ecapa_loss=0.0001844, whisper_loss=0.09358, over 19019.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01103, ecapa_loss=0.0001847, whisper_loss=0.09251, over 3871873.84 frames. ], batch size: 72, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:36:45,237 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 02:36:46,625 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 02:37:02,683 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 02:37:05,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1415420.0, ans=0.0 2024-08-12 02:37:07,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1415420.0, ans=0.04949747468305833 2024-08-12 02:37:37,671 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.27 vs. limit=15.0 2024-08-12 02:37:56,021 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11150, loss[loss=0.1174, beats_loss=0.01085, ecapa_loss=0.0001871, whisper_loss=0.1047, over 21784.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01106, ecapa_loss=0.0001828, whisper_loss=0.09312, over 3895775.31 frames. ], batch size: 87, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:38:01,693 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 02:38:03,686 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=6.0 2024-08-12 02:38:04,694 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.10 vs. limit=22.5 2024-08-12 02:38:15,917 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-12 02:38:20,235 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 36 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 02:38:29,478 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2024-08-12 02:38:31,732 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 02:38:39,443 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.570e+01 2.845e+01 3.196e+01 4.459e+01, threshold=5.690e+01, percent-clipped=0.0 2024-08-12 02:38:57,480 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 02:39:01,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1416220.0, ans=0.125 2024-08-12 02:39:06,869 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11200, loss[loss=0.08979, beats_loss=0.01085, ecapa_loss=0.0002112, whisper_loss=0.07683, over 16111.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.0111, ecapa_loss=0.0001819, whisper_loss=0.09289, over 3868834.06 frames. ], batch size: 65, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:39:07,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1416320.0, ans=0.0 2024-08-12 02:39:15,057 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 02:39:19,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1416420.0, ans=10.0 2024-08-12 02:39:39,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1416520.0, ans=0.125 2024-08-12 02:39:49,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1416620.0, ans=0.125 2024-08-12 02:39:54,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1416620.0, ans=0.125 2024-08-12 02:40:01,821 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 24 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-12 02:40:06,754 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.28 vs. limit=15.0 2024-08-12 02:40:10,120 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 02:40:16,478 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11250, loss[loss=0.1027, beats_loss=0.009706, ecapa_loss=0.0002263, whisper_loss=0.09076, over 18761.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01107, ecapa_loss=0.0001834, whisper_loss=0.09284, over 3855892.37 frames. ], batch size: 79, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:40:29,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1416920.0, ans=0.1 2024-08-12 02:40:32,344 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-12 02:40:33,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1416920.0, ans=0.125 2024-08-12 02:40:40,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1416920.0, ans=0.125 2024-08-12 02:40:40,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1416920.0, ans=0.125 2024-08-12 02:40:52,905 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 02:40:55,382 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 02:40:59,360 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.698e+01 3.076e+01 3.539e+01 6.948e+01, threshold=6.153e+01, percent-clipped=1.0 2024-08-12 02:41:02,309 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 02:41:12,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1417220.0, ans=0.125 2024-08-12 02:41:23,360 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 33 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-12 02:41:25,781 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11300, loss[loss=0.1045, beats_loss=0.01079, ecapa_loss=0.0002084, whisper_loss=0.09166, over 14578.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01112, ecapa_loss=0.0001829, whisper_loss=0.0932, over 3881585.14 frames. ], batch size: 59, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:41:36,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1417320.0, ans=0.2 2024-08-12 02:41:38,858 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-12 02:41:40,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1417420.0, ans=0.0 2024-08-12 02:41:44,964 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 02:41:45,661 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.91 vs. limit=22.5 2024-08-12 02:41:49,373 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.948e-02 2024-08-12 02:42:06,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1417520.0, ans=0.1 2024-08-12 02:42:06,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1417520.0, ans=0.125 2024-08-12 02:42:18,752 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 02:42:35,638 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11350, loss[loss=0.09798, beats_loss=0.01212, ecapa_loss=0.0002084, whisper_loss=0.08378, over 20510.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01111, ecapa_loss=0.0001824, whisper_loss=0.09315, over 3865078.31 frames. ], batch size: 84, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:42:37,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1417820.0, ans=0.125 2024-08-12 02:42:44,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1417820.0, ans=0.125 2024-08-12 02:42:45,319 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-12 02:42:55,607 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.38 vs. limit=15.0 2024-08-12 02:42:58,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1417920.0, ans=0.125 2024-08-12 02:43:05,283 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2024-08-12 02:43:11,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1418020.0, ans=0.0 2024-08-12 02:43:18,342 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.545e+01 2.820e+01 3.202e+01 5.315e+01, threshold=5.639e+01, percent-clipped=0.0 2024-08-12 02:43:45,037 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11400, loss[loss=0.08829, beats_loss=0.01348, ecapa_loss=0.0002103, whisper_loss=0.07271, over 12909.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01104, ecapa_loss=0.000183, whisper_loss=0.09404, over 3872021.80 frames. ], batch size: 55, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:43:52,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1418320.0, ans=0.125 2024-08-12 02:43:52,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1418320.0, ans=0.125 2024-08-12 02:44:39,547 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.90 vs. limit=22.5 2024-08-12 02:44:50,028 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.06 vs. limit=15.0 2024-08-12 02:44:53,369 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11450, loss[loss=0.06317, beats_loss=0.01439, ecapa_loss=0.0001368, whisper_loss=0.04741, over 14307.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01109, ecapa_loss=0.0001825, whisper_loss=0.09338, over 3859030.18 frames. ], batch size: 58, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:44:57,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1418820.0, ans=0.125 2024-08-12 02:44:59,033 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 02:45:12,629 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 02:45:28,430 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 02:45:28,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1419020.0, ans=0.1 2024-08-12 02:45:31,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1419020.0, ans=0.0 2024-08-12 02:45:32,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1419020.0, ans=0.125 2024-08-12 02:45:33,640 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-12 02:45:36,290 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+01 2.629e+01 3.024e+01 3.484e+01 5.992e+01, threshold=6.048e+01, percent-clipped=1.0 2024-08-12 02:45:38,074 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 02:45:46,329 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 02:45:53,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1419220.0, ans=0.1 2024-08-12 02:45:57,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1419220.0, ans=0.0 2024-08-12 02:46:01,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1419320.0, ans=0.125 2024-08-12 02:46:01,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1419320.0, ans=0.125 2024-08-12 02:46:02,700 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11500, loss[loss=0.1242, beats_loss=0.006795, ecapa_loss=0.0001915, whisper_loss=0.1155, over 17903.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01106, ecapa_loss=0.0001816, whisper_loss=0.09393, over 3898438.39 frames. ], batch size: 69, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:46:04,130 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 30 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 02:46:24,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1419420.0, ans=0.0 2024-08-12 02:46:25,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1419420.0, ans=10.0 2024-08-12 02:46:32,275 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 02:46:32,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1419520.0, ans=0.125 2024-08-12 02:46:39,347 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 02:46:46,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1419620.0, ans=0.125 2024-08-12 02:46:51,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1419620.0, ans=0.125 2024-08-12 02:46:58,470 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 02:47:11,204 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11550, loss[loss=0.09495, beats_loss=0.01167, ecapa_loss=0.0002469, whisper_loss=0.08081, over 18908.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01106, ecapa_loss=0.0001821, whisper_loss=0.0935, over 3875197.34 frames. ], batch size: 81, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:47:22,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1419820.0, ans=0.2 2024-08-12 02:47:24,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1419920.0, ans=0.125 2024-08-12 02:47:25,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1419920.0, ans=0.1 2024-08-12 02:47:31,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1419920.0, ans=0.95 2024-08-12 02:47:32,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1419920.0, ans=0.125 2024-08-12 02:47:53,660 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.668e+01 3.016e+01 3.497e+01 6.036e+01, threshold=6.031e+01, percent-clipped=0.0 2024-08-12 02:47:54,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1420120.0, ans=0.0 2024-08-12 02:48:02,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1420120.0, ans=0.0 2024-08-12 02:48:02,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1420120.0, ans=0.125 2024-08-12 02:48:05,579 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 02:48:05,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1420220.0, ans=0.125 2024-08-12 02:48:20,576 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11600, loss[loss=0.08699, beats_loss=0.0129, ecapa_loss=0.0001887, whisper_loss=0.0722, over 19635.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01099, ecapa_loss=0.0001839, whisper_loss=0.09448, over 3907719.21 frames. ], batch size: 82, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:48:43,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1420420.0, ans=0.125 2024-08-12 02:48:49,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1420520.0, ans=0.125 2024-08-12 02:49:05,948 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=15.0 2024-08-12 02:49:21,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1420720.0, ans=0.125 2024-08-12 02:49:29,597 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11650, loss[loss=0.1163, beats_loss=0.01045, ecapa_loss=0.0001859, whisper_loss=0.104, over 17746.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01103, ecapa_loss=0.0001849, whisper_loss=0.09433, over 3904865.57 frames. ], batch size: 70, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:49:31,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1420820.0, ans=0.1 2024-08-12 02:49:32,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1420820.0, ans=0.0 2024-08-12 02:49:41,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1420820.0, ans=0.2 2024-08-12 02:49:43,529 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 02:49:55,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1421020.0, ans=0.125 2024-08-12 02:50:08,346 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 24 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-12 02:50:09,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1421120.0, ans=0.125 2024-08-12 02:50:12,274 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.632e+01 2.905e+01 3.202e+01 4.413e+01, threshold=5.810e+01, percent-clipped=0.0 2024-08-12 02:50:16,286 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 02:50:26,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1421220.0, ans=0.125 2024-08-12 02:50:38,294 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11700, loss[loss=0.1009, beats_loss=0.01287, ecapa_loss=0.0001949, whisper_loss=0.08612, over 19019.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01107, ecapa_loss=0.000185, whisper_loss=0.09384, over 3878475.41 frames. ], batch size: 80, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:50:54,528 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 02:51:01,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1421420.0, ans=0.125 2024-08-12 02:51:34,951 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 02:51:41,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1421720.0, ans=0.125 2024-08-12 02:51:46,591 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11750, loss[loss=0.09614, beats_loss=0.01112, ecapa_loss=0.0001753, whisper_loss=0.08326, over 15047.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01116, ecapa_loss=0.0001841, whisper_loss=0.09328, over 3879460.49 frames. ], batch size: 59, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:51:46,843 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 02:51:53,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1421820.0, ans=0.2 2024-08-12 02:51:59,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1421920.0, ans=0.0 2024-08-12 02:52:17,102 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 02:52:29,193 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.531e+01 2.844e+01 3.355e+01 7.826e+01, threshold=5.688e+01, percent-clipped=1.0 2024-08-12 02:52:34,393 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2024-08-12 02:52:35,770 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.31 vs. limit=15.0 2024-08-12 02:52:46,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1422220.0, ans=0.07 2024-08-12 02:52:55,144 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11800, loss[loss=0.1154, beats_loss=0.01066, ecapa_loss=0.0001662, whisper_loss=0.1031, over 22417.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01121, ecapa_loss=0.0001832, whisper_loss=0.09336, over 3892348.72 frames. ], batch size: 90, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:53:02,020 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 02:53:07,251 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 02:53:10,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1422420.0, ans=0.0 2024-08-12 02:53:15,975 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-12 02:53:20,028 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 02:53:22,140 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.92 vs. limit=22.5 2024-08-12 02:53:24,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1422520.0, ans=0.125 2024-08-12 02:53:27,867 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.09 vs. limit=15.0 2024-08-12 02:53:29,661 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 32 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 02:53:37,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1422620.0, ans=0.125 2024-08-12 02:53:38,934 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.555e-01 2024-08-12 02:53:40,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1422620.0, ans=0.125 2024-08-12 02:53:42,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1422620.0, ans=0.07 2024-08-12 02:53:51,005 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-12 02:54:01,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=1422720.0, ans=0.1 2024-08-12 02:54:04,584 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11850, loss[loss=0.08821, beats_loss=0.01291, ecapa_loss=0.0001682, whisper_loss=0.07362, over 21960.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01123, ecapa_loss=0.0001833, whisper_loss=0.09216, over 3876606.04 frames. ], batch size: 90, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:54:18,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1422920.0, ans=0.0 2024-08-12 02:54:35,070 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 02:54:47,415 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.632e+01 2.955e+01 3.333e+01 2.077e+02, threshold=5.910e+01, percent-clipped=1.0 2024-08-12 02:55:12,410 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11900, loss[loss=0.08352, beats_loss=0.01441, ecapa_loss=0.0001732, whisper_loss=0.06739, over 16293.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01121, ecapa_loss=0.0001843, whisper_loss=0.09245, over 3912320.24 frames. ], batch size: 67, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:55:19,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1423320.0, ans=0.125 2024-08-12 02:55:22,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1423320.0, ans=10.0 2024-08-12 02:55:35,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1423420.0, ans=0.2 2024-08-12 02:55:39,355 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.65 vs. limit=15.0 2024-08-12 02:55:40,221 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 02:55:46,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1423520.0, ans=0.125 2024-08-12 02:55:50,441 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 02:55:51,657 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 02:55:53,117 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-12 02:56:08,717 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-12 02:56:15,768 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 02:56:22,505 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 11950, loss[loss=0.1142, beats_loss=0.008523, ecapa_loss=0.0002529, whisper_loss=0.1031, over 21679.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01106, ecapa_loss=0.0001867, whisper_loss=0.09296, over 3891447.08 frames. ], batch size: 93, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:56:24,142 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 02:56:45,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1423920.0, ans=0.1 2024-08-12 02:57:00,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1424020.0, ans=0.0 2024-08-12 02:57:06,031 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.496e+01 2.723e+01 3.288e+01 6.365e+01, threshold=5.445e+01, percent-clipped=1.0 2024-08-12 02:57:31,484 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12000, loss[loss=0.1317, beats_loss=0.009174, ecapa_loss=0.0002097, whisper_loss=0.1204, over 22937.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01111, ecapa_loss=0.0001854, whisper_loss=0.09241, over 3885609.25 frames. ], batch size: 89, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:57:31,485 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 02:58:10,695 INFO [train_multi_KD3.py:1149] (2/4) Epoch 10, validation on ASR_libri: loss=0.2556, beats_loss=0, ecapa_loss=0.0006161, whisper_loss=0.2495, over 922467.00 frames. 2024-08-12 02:58:28,814 INFO [train_multi_KD3.py:1149] (2/4) Epoch 10, validation on SV_voxceleb1: loss=0.005027, beats_loss=0, ecapa_loss=0.0005027, whisper_loss=0, over 939242.00 frames. 2024-08-12 03:00:26,519 INFO [train_multi_KD3.py:1149] (2/4) Epoch 10, validation on AT_audioset: loss=0.02469, beats_loss=0.02469, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 03:00:26,523 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 03:00:32,709 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.68 vs. limit=22.5 2024-08-12 03:00:38,125 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.59 vs. limit=22.5 2024-08-12 03:00:44,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1424420.0, ans=0.2 2024-08-12 03:00:52,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1424520.0, ans=0.125 2024-08-12 03:01:01,244 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.106e-02 2024-08-12 03:01:07,426 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-12 03:01:24,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1424720.0, ans=0.0 2024-08-12 03:01:26,490 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-12 03:01:36,061 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12050, loss[loss=0.1017, beats_loss=0.01151, ecapa_loss=0.0001689, whisper_loss=0.08855, over 15986.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01108, ecapa_loss=0.0001843, whisper_loss=0.09299, over 3862828.83 frames. ], batch size: 60, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:01:38,584 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.87 vs. limit=15.0 2024-08-12 03:01:47,444 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 03:02:00,122 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 03:02:02,889 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 03:02:19,401 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.644e+01 2.915e+01 3.248e+01 4.728e+01, threshold=5.830e+01, percent-clipped=0.0 2024-08-12 03:02:24,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1425120.0, ans=0.2 2024-08-12 03:02:25,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1425120.0, ans=0.125 2024-08-12 03:02:39,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1425220.0, ans=0.0 2024-08-12 03:02:45,901 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12100, loss[loss=0.1036, beats_loss=0.01274, ecapa_loss=0.0001604, whisper_loss=0.0893, over 20915.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.011, ecapa_loss=0.000185, whisper_loss=0.09304, over 3853438.72 frames. ], batch size: 84, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:02:51,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1425320.0, ans=0.07 2024-08-12 03:02:53,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1425320.0, ans=0.0 2024-08-12 03:02:54,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1425320.0, ans=0.125 2024-08-12 03:02:58,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1425420.0, ans=0.1 2024-08-12 03:03:04,248 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-12 03:03:29,990 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 13 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 03:03:33,101 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.26 vs. limit=10.0 2024-08-12 03:03:33,906 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 03:03:36,548 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 03:03:42,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1425720.0, ans=0.125 2024-08-12 03:03:45,048 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 03:03:54,991 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12150, loss[loss=0.09029, beats_loss=0.009671, ecapa_loss=0.0002318, whisper_loss=0.0783, over 14343.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.011, ecapa_loss=0.0001861, whisper_loss=0.09268, over 3838853.76 frames. ], batch size: 59, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:04:08,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1425920.0, ans=0.125 2024-08-12 03:04:10,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1425920.0, ans=0.09899494936611666 2024-08-12 03:04:23,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1426020.0, ans=0.0 2024-08-12 03:04:27,953 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.39 vs. limit=10.0 2024-08-12 03:04:29,874 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 03:04:38,041 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.674e+01 3.067e+01 3.443e+01 6.340e+01, threshold=6.135e+01, percent-clipped=1.0 2024-08-12 03:04:58,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1426220.0, ans=0.0 2024-08-12 03:05:04,291 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12200, loss[loss=0.09938, beats_loss=0.01042, ecapa_loss=0.0002314, whisper_loss=0.08665, over 18999.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01106, ecapa_loss=0.0001868, whisper_loss=0.09237, over 3815990.02 frames. ], batch size: 79, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:05:07,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1426320.0, ans=0.125 2024-08-12 03:05:20,524 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 33 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 03:05:22,302 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-08-12 03:05:26,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1426420.0, ans=0.0 2024-08-12 03:05:28,073 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.75 vs. limit=22.5 2024-08-12 03:05:38,921 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.89 vs. limit=15.0 2024-08-12 03:05:41,122 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 03:05:59,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1426720.0, ans=0.125 2024-08-12 03:06:09,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1426720.0, ans=0.125 2024-08-12 03:06:13,041 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12250, loss[loss=0.108, beats_loss=0.01046, ecapa_loss=0.0002303, whisper_loss=0.09519, over 18125.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01109, ecapa_loss=0.0001853, whisper_loss=0.09273, over 3805682.48 frames. ], batch size: 76, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:06:17,486 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-12 03:06:21,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1426820.0, ans=6.0 2024-08-12 03:06:36,742 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 03:06:50,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.28 vs. limit=6.0 2024-08-12 03:06:56,577 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.672e+01 2.930e+01 3.249e+01 5.324e+01, threshold=5.861e+01, percent-clipped=0.0 2024-08-12 03:07:02,337 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 03:07:08,049 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 03:07:11,793 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=17.87 vs. limit=15.0 2024-08-12 03:07:23,269 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12300, loss[loss=0.1039, beats_loss=0.01251, ecapa_loss=0.0001687, whisper_loss=0.08967, over 22034.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01108, ecapa_loss=0.0001859, whisper_loss=0.09273, over 3836252.69 frames. ], batch size: 88, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:07:40,899 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.67 vs. limit=15.0 2024-08-12 03:08:14,727 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2024-08-12 03:08:15,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1427620.0, ans=0.125 2024-08-12 03:08:30,300 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 03:08:32,589 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12350, loss[loss=0.0993, beats_loss=0.008789, ecapa_loss=0.0001558, whisper_loss=0.08896, over 15108.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01118, ecapa_loss=0.0001862, whisper_loss=0.09219, over 3834714.19 frames. ], batch size: 55, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:08:47,136 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.70 vs. limit=22.5 2024-08-12 03:09:06,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1428020.0, ans=0.0 2024-08-12 03:09:16,613 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-08-12 03:09:18,887 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.674e+01 3.021e+01 3.383e+01 7.125e+01, threshold=6.043e+01, percent-clipped=2.0 2024-08-12 03:09:48,010 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12400, loss[loss=0.1008, beats_loss=0.01169, ecapa_loss=0.0001876, whisper_loss=0.08718, over 22250.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01119, ecapa_loss=0.0001849, whisper_loss=0.0926, over 3862171.02 frames. ], batch size: 91, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:09:48,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1428320.0, ans=0.125 2024-08-12 03:10:04,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1428420.0, ans=0.125 2024-08-12 03:10:05,852 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2024-08-12 03:10:13,653 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 03:10:21,076 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 03:10:31,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1428620.0, ans=0.1 2024-08-12 03:10:57,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1428720.0, ans=0.125 2024-08-12 03:11:02,572 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12450, loss[loss=0.0988, beats_loss=0.01184, ecapa_loss=0.0001732, whisper_loss=0.08523, over 19140.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01118, ecapa_loss=0.0001844, whisper_loss=0.09215, over 3881723.77 frames. ], batch size: 74, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:11:24,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1428920.0, ans=0.0 2024-08-12 03:11:46,529 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.648e+01 2.502e+01 2.764e+01 3.282e+01 5.590e+01, threshold=5.528e+01, percent-clipped=0.0 2024-08-12 03:11:49,858 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-12 03:11:53,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1429120.0, ans=0.1 2024-08-12 03:11:59,795 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-08-12 03:12:03,600 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 20 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 03:12:03,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1429220.0, ans=0.125 2024-08-12 03:12:05,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1429220.0, ans=0.0 2024-08-12 03:12:06,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1429220.0, ans=0.125 2024-08-12 03:12:14,732 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12500, loss[loss=0.09814, beats_loss=0.01292, ecapa_loss=0.0002065, whisper_loss=0.08316, over 20852.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0111, ecapa_loss=0.000185, whisper_loss=0.09216, over 3892303.51 frames. ], batch size: 88, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:12:27,339 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 03:12:37,674 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 03:12:37,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1429420.0, ans=0.05 2024-08-12 03:12:54,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1429520.0, ans=0.035 2024-08-12 03:13:27,035 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12550, loss[loss=0.09978, beats_loss=0.01023, ecapa_loss=0.0001944, whisper_loss=0.0876, over 19486.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01113, ecapa_loss=0.0001844, whisper_loss=0.09194, over 3904143.13 frames. ], batch size: 79, lr: 6.20e-03, grad_scale: 2.305843009213694e+18 2024-08-12 03:13:28,656 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 03:13:34,075 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 03:13:46,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1429920.0, ans=0.0 2024-08-12 03:14:04,302 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 33 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 03:14:07,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1430020.0, ans=0.1 2024-08-12 03:14:12,513 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.663e+01 2.938e+01 3.317e+01 5.229e+01, threshold=5.876e+01, percent-clipped=0.0 2024-08-12 03:14:23,011 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 03:14:24,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1430220.0, ans=0.0 2024-08-12 03:14:38,717 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12600, loss[loss=0.09313, beats_loss=0.01077, ecapa_loss=0.0001964, whisper_loss=0.08039, over 16972.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01101, ecapa_loss=0.0001857, whisper_loss=0.09319, over 3898761.39 frames. ], batch size: 68, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:14:47,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1430320.0, ans=0.0 2024-08-12 03:14:55,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1430420.0, ans=0.035 2024-08-12 03:14:55,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1430420.0, ans=0.125 2024-08-12 03:15:01,146 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-12 03:15:07,457 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=22.5 2024-08-12 03:15:20,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1430520.0, ans=0.0 2024-08-12 03:15:27,885 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 17 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-12 03:15:43,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1430720.0, ans=0.125 2024-08-12 03:15:52,265 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12650, loss[loss=0.1065, beats_loss=0.0124, ecapa_loss=0.0001452, whisper_loss=0.09267, over 18308.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01106, ecapa_loss=0.0001859, whisper_loss=0.09249, over 3876574.96 frames. ], batch size: 71, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:16:04,986 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-12 03:16:07,032 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 03:16:18,471 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 03:16:23,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1431020.0, ans=0.0 2024-08-12 03:16:33,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1431020.0, ans=0.2 2024-08-12 03:16:38,972 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.672e+01 3.119e+01 3.630e+01 6.657e+01, threshold=6.239e+01, percent-clipped=2.0 2024-08-12 03:16:51,818 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 03:16:53,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1431220.0, ans=0.125 2024-08-12 03:17:05,573 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12700, loss[loss=0.08246, beats_loss=0.01464, ecapa_loss=0.0001495, whisper_loss=0.06633, over 21101.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01109, ecapa_loss=0.0001839, whisper_loss=0.09326, over 3878920.11 frames. ], batch size: 89, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:17:11,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1431320.0, ans=0.025 2024-08-12 03:17:36,514 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 03:17:51,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1431620.0, ans=10.0 2024-08-12 03:17:51,567 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2024-08-12 03:18:02,601 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 03:18:09,922 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-12 03:18:18,305 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12750, loss[loss=0.1123, beats_loss=0.009185, ecapa_loss=0.0002161, whisper_loss=0.1009, over 16018.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01113, ecapa_loss=0.0001836, whisper_loss=0.09309, over 3902914.38 frames. ], batch size: 65, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:18:24,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1431820.0, ans=0.125 2024-08-12 03:18:32,601 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-12 03:18:47,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1432020.0, ans=0.125 2024-08-12 03:18:47,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1432020.0, ans=0.07 2024-08-12 03:18:52,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1432020.0, ans=0.125 2024-08-12 03:18:58,835 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 03:19:02,885 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.558e+01 2.840e+01 3.489e+01 4.506e+01, threshold=5.680e+01, percent-clipped=0.0 2024-08-12 03:19:03,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1432120.0, ans=0.125 2024-08-12 03:19:29,715 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12800, loss[loss=0.08616, beats_loss=0.008411, ecapa_loss=0.0002355, whisper_loss=0.0754, over 14023.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.0112, ecapa_loss=0.0001846, whisper_loss=0.09272, over 3911275.74 frames. ], batch size: 57, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:19:36,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1432320.0, ans=0.1 2024-08-12 03:19:41,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1432320.0, ans=0.125 2024-08-12 03:19:54,836 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-12 03:20:09,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1432520.0, ans=0.125 2024-08-12 03:20:13,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1432620.0, ans=0.2 2024-08-12 03:20:18,685 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 13 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-12 03:20:21,648 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=12.0 2024-08-12 03:20:39,352 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12850, loss[loss=0.09818, beats_loss=0.01175, ecapa_loss=0.0002068, whisper_loss=0.08436, over 18400.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01126, ecapa_loss=0.0001835, whisper_loss=0.09243, over 3898226.37 frames. ], batch size: 76, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:20:45,002 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.92 vs. limit=15.0 2024-08-12 03:21:00,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1432920.0, ans=0.125 2024-08-12 03:21:03,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1432920.0, ans=0.125 2024-08-12 03:21:23,445 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.483e+01 2.799e+01 3.175e+01 4.760e+01, threshold=5.599e+01, percent-clipped=0.0 2024-08-12 03:21:30,966 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 03:21:32,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1433120.0, ans=0.0 2024-08-12 03:21:36,395 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 03:21:48,370 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12900, loss[loss=0.1064, beats_loss=0.009906, ecapa_loss=0.0002065, whisper_loss=0.09444, over 23328.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01126, ecapa_loss=0.0001848, whisper_loss=0.09165, over 3874735.14 frames. ], batch size: 93, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:21:54,871 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=12.0 2024-08-12 03:22:05,578 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-12 03:22:08,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1433420.0, ans=0.125 2024-08-12 03:22:26,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1433520.0, ans=0.125 2024-08-12 03:22:27,312 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 03:22:33,404 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 03:22:58,931 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 12950, loss[loss=0.081, beats_loss=0.01441, ecapa_loss=0.0002464, whisper_loss=0.06413, over 14590.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01125, ecapa_loss=0.0001846, whisper_loss=0.09113, over 3862194.94 frames. ], batch size: 62, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:23:09,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1433820.0, ans=0.1 2024-08-12 03:23:11,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1433820.0, ans=0.035 2024-08-12 03:23:22,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1433920.0, ans=0.125 2024-08-12 03:23:34,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1434020.0, ans=0.2 2024-08-12 03:23:44,721 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 03:23:45,702 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.584e+01 3.018e+01 3.555e+01 5.734e+01, threshold=6.036e+01, percent-clipped=3.0 2024-08-12 03:23:49,751 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 03:23:53,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1434120.0, ans=0.0 2024-08-12 03:23:54,272 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 03:23:57,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1434220.0, ans=0.125 2024-08-12 03:24:11,230 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13000, loss[loss=0.08292, beats_loss=0.01016, ecapa_loss=0.0001506, whisper_loss=0.07126, over 15598.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01116, ecapa_loss=0.000186, whisper_loss=0.0917, over 3878924.58 frames. ], batch size: 59, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:24:13,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1434320.0, ans=0.125 2024-08-12 03:24:38,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1434420.0, ans=0.05 2024-08-12 03:24:49,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.30 vs. limit=10.0 2024-08-12 03:24:54,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1434620.0, ans=0.125 2024-08-12 03:25:02,289 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 03:25:03,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1434620.0, ans=0.0 2024-08-12 03:25:15,294 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 03:25:23,880 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2024-08-12 03:25:24,290 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13050, loss[loss=0.09131, beats_loss=0.01011, ecapa_loss=0.0002166, whisper_loss=0.07903, over 20237.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01117, ecapa_loss=0.0001861, whisper_loss=0.09171, over 3879591.76 frames. ], batch size: 83, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:25:30,283 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-12 03:25:33,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1434820.0, ans=0.125 2024-08-12 03:25:40,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1434920.0, ans=0.0 2024-08-12 03:25:53,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1435020.0, ans=0.125 2024-08-12 03:25:55,928 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 20 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-12 03:26:12,494 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+01 2.574e+01 2.930e+01 3.375e+01 4.949e+01, threshold=5.859e+01, percent-clipped=0.0 2024-08-12 03:26:17,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1435120.0, ans=0.125 2024-08-12 03:26:22,429 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-12 03:26:27,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1435220.0, ans=0.125 2024-08-12 03:26:41,674 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13100, loss[loss=0.1108, beats_loss=0.009758, ecapa_loss=0.0002029, whisper_loss=0.099, over 20879.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01116, ecapa_loss=0.0001868, whisper_loss=0.09127, over 3893034.32 frames. ], batch size: 84, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:27:11,875 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-12 03:27:41,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1435720.0, ans=0.125 2024-08-12 03:27:48,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1435720.0, ans=0.0 2024-08-12 03:27:56,458 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13150, loss[loss=0.1002, beats_loss=0.01196, ecapa_loss=0.0001809, whisper_loss=0.08648, over 16790.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01118, ecapa_loss=0.0001853, whisper_loss=0.09178, over 3895018.21 frames. ], batch size: 66, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:28:08,038 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 28 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 03:28:16,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1435920.0, ans=0.0 2024-08-12 03:28:26,479 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.441e+05 2024-08-12 03:28:29,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1436020.0, ans=0.5 2024-08-12 03:28:31,779 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 16 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-12 03:28:43,062 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.068e+01 2.467e+01 2.835e+01 3.173e+01 4.953e+01, threshold=5.670e+01, percent-clipped=0.0 2024-08-12 03:29:07,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1436320.0, ans=0.2 2024-08-12 03:29:08,665 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13200, loss[loss=0.1009, beats_loss=0.01395, ecapa_loss=0.0001736, whisper_loss=0.08522, over 20520.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01117, ecapa_loss=0.0001849, whisper_loss=0.09167, over 3857307.02 frames. ], batch size: 81, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:29:29,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1436420.0, ans=0.125 2024-08-12 03:29:37,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1436520.0, ans=0.125 2024-08-12 03:29:37,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1436520.0, ans=0.0 2024-08-12 03:29:47,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1436520.0, ans=0.125 2024-08-12 03:29:58,829 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 03:30:04,908 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2024-08-12 03:30:05,481 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 03:30:13,818 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 03:30:17,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1436720.0, ans=0.2 2024-08-12 03:30:20,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1436720.0, ans=0.125 2024-08-12 03:30:20,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1436720.0, ans=0.125 2024-08-12 03:30:22,593 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13250, loss[loss=0.1057, beats_loss=0.007929, ecapa_loss=0.0001925, whisper_loss=0.09586, over 15360.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01107, ecapa_loss=0.0001871, whisper_loss=0.09224, over 3836551.09 frames. ], batch size: 58, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:30:30,258 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 03:30:34,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1436820.0, ans=0.125 2024-08-12 03:30:41,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1436920.0, ans=0.125 2024-08-12 03:31:01,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1437020.0, ans=0.0 2024-08-12 03:31:06,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1437120.0, ans=0.0 2024-08-12 03:31:09,110 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 03:31:09,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1437120.0, ans=0.125 2024-08-12 03:31:10,209 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.496e+01 2.755e+01 3.152e+01 5.278e+01, threshold=5.510e+01, percent-clipped=0.0 2024-08-12 03:31:12,174 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 03:31:14,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=1437120.0, ans=0.05 2024-08-12 03:31:31,489 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-12 03:31:37,496 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13300, loss[loss=0.1182, beats_loss=0.01015, ecapa_loss=0.0001967, whisper_loss=0.1061, over 19253.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01104, ecapa_loss=0.0001862, whisper_loss=0.09215, over 3825448.98 frames. ], batch size: 78, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:31:38,460 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-08-12 03:31:43,396 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-12 03:31:49,029 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.29 vs. limit=22.5 2024-08-12 03:32:02,759 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 12 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 03:32:11,926 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-12 03:32:26,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1437620.0, ans=0.1 2024-08-12 03:32:30,711 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 03:32:36,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1437720.0, ans=0.125 2024-08-12 03:32:50,787 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13350, loss[loss=0.1154, beats_loss=0.009695, ecapa_loss=0.0001781, whisper_loss=0.1039, over 22969.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01109, ecapa_loss=0.0001849, whisper_loss=0.09208, over 3814105.81 frames. ], batch size: 88, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:33:06,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1437920.0, ans=0.125 2024-08-12 03:33:08,566 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.96 vs. limit=22.5 2024-08-12 03:33:09,274 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 03:33:10,933 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-12 03:33:37,931 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.604e+01 2.851e+01 3.185e+01 1.772e+02, threshold=5.702e+01, percent-clipped=1.0 2024-08-12 03:33:45,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1438120.0, ans=0.125 2024-08-12 03:33:46,869 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 32 from LS+wenet, 6 from Vox, 26 fro AS 2024-08-12 03:34:01,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1438220.0, ans=0.125 2024-08-12 03:34:04,009 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13400, loss[loss=0.1054, beats_loss=0.008256, ecapa_loss=0.0001734, whisper_loss=0.09542, over 18025.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01105, ecapa_loss=0.0001836, whisper_loss=0.09304, over 3823145.92 frames. ], batch size: 68, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:34:04,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1438320.0, ans=0.125 2024-08-12 03:34:05,816 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 22 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-12 03:34:07,070 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 03:34:13,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1438320.0, ans=0.125 2024-08-12 03:34:16,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1438320.0, ans=0.125 2024-08-12 03:34:19,182 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 03:34:30,533 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 03:34:42,259 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.38 vs. limit=15.0 2024-08-12 03:34:49,286 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-12 03:34:52,969 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=12.0 2024-08-12 03:35:15,857 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13450, loss[loss=0.1036, beats_loss=0.009616, ecapa_loss=0.0002431, whisper_loss=0.09152, over 21633.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.011, ecapa_loss=0.0001856, whisper_loss=0.09252, over 3840351.19 frames. ], batch size: 92, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:35:17,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-08-12 03:35:20,853 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2024-08-12 03:35:42,117 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 03:35:51,031 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.65 vs. limit=12.0 2024-08-12 03:35:53,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1439020.0, ans=0.1 2024-08-12 03:35:53,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1439020.0, ans=0.09899494936611666 2024-08-12 03:36:00,692 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.64 vs. limit=10.0 2024-08-12 03:36:02,176 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.531e+01 2.871e+01 3.206e+01 5.320e+01, threshold=5.741e+01, percent-clipped=0.0 2024-08-12 03:36:10,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1439120.0, ans=0.0 2024-08-12 03:36:23,920 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.43 vs. limit=22.5 2024-08-12 03:36:26,647 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 03:36:29,190 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13500, loss[loss=0.09647, beats_loss=0.01168, ecapa_loss=0.0002063, whisper_loss=0.08273, over 19530.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01101, ecapa_loss=0.000185, whisper_loss=0.09251, over 3892188.67 frames. ], batch size: 84, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:36:41,466 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.35 vs. limit=12.0 2024-08-12 03:36:43,981 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.966e-01 2024-08-12 03:36:47,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1439420.0, ans=0.125 2024-08-12 03:36:48,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1439420.0, ans=0.125 2024-08-12 03:36:53,665 INFO [train_multi_KD3.py:844] (2/4) A total of 97 cuts. 27 from LS+wenet, 18 from Vox, 52 fro AS 2024-08-12 03:36:59,434 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 03:37:04,650 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.64 vs. limit=22.5 2024-08-12 03:37:20,503 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2024-08-12 03:37:41,292 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13550, loss[loss=0.1111, beats_loss=0.01261, ecapa_loss=0.0001901, whisper_loss=0.09656, over 22526.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01108, ecapa_loss=0.0001848, whisper_loss=0.09254, over 3897673.30 frames. ], batch size: 91, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:37:59,802 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 03:38:23,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1440020.0, ans=0.125 2024-08-12 03:38:28,552 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.568e+01 2.866e+01 3.422e+01 5.610e+01, threshold=5.733e+01, percent-clipped=0.0 2024-08-12 03:38:29,096 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=8.365e-02 2024-08-12 03:38:47,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1440220.0, ans=0.125 2024-08-12 03:38:52,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1440320.0, ans=10.0 2024-08-12 03:38:53,351 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13600, loss[loss=0.1029, beats_loss=0.01094, ecapa_loss=0.0002047, whisper_loss=0.08996, over 21553.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01111, ecapa_loss=0.0001836, whisper_loss=0.09252, over 3920333.06 frames. ], batch size: 89, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:38:55,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1440320.0, ans=0.125 2024-08-12 03:38:59,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1440320.0, ans=0.09899494936611666 2024-08-12 03:39:21,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1440520.0, ans=0.0 2024-08-12 03:39:34,818 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.25 vs. limit=10.0 2024-08-12 03:39:37,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1440620.0, ans=0.0 2024-08-12 03:39:38,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1440620.0, ans=0.125 2024-08-12 03:40:05,130 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13650, loss[loss=0.1086, beats_loss=0.01191, ecapa_loss=0.0001754, whisper_loss=0.09489, over 13524.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01112, ecapa_loss=0.0001841, whisper_loss=0.09247, over 3905876.09 frames. ], batch size: 55, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:40:07,108 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.399e+02 2024-08-12 03:40:08,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1440820.0, ans=0.1 2024-08-12 03:40:10,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1440820.0, ans=0.2 2024-08-12 03:40:10,653 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 03:40:14,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1440820.0, ans=0.2 2024-08-12 03:40:24,251 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-12 03:40:27,474 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 03:40:30,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1440920.0, ans=0.0 2024-08-12 03:40:37,759 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.354e+02 2024-08-12 03:40:40,532 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=12.0 2024-08-12 03:40:50,817 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.520e+01 2.826e+01 3.243e+01 5.319e+01, threshold=5.652e+01, percent-clipped=0.0 2024-08-12 03:40:51,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1441120.0, ans=0.0 2024-08-12 03:41:01,830 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 03:41:02,440 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.95 vs. limit=22.5 2024-08-12 03:41:04,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1441220.0, ans=0.125 2024-08-12 03:41:10,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1441220.0, ans=0.125 2024-08-12 03:41:10,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1441220.0, ans=0.1 2024-08-12 03:41:17,329 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13700, loss[loss=0.08703, beats_loss=0.01404, ecapa_loss=0.0001667, whisper_loss=0.07133, over 15099.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0111, ecapa_loss=0.0001839, whisper_loss=0.09324, over 3912473.52 frames. ], batch size: 59, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:41:24,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1441320.0, ans=0.1 2024-08-12 03:41:28,822 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-12 03:41:34,714 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 03:41:36,031 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 03:41:39,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1441420.0, ans=0.125 2024-08-12 03:41:48,019 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 03:41:51,135 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-12 03:41:56,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1441520.0, ans=0.125 2024-08-12 03:42:04,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1441620.0, ans=0.1 2024-08-12 03:42:16,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1441720.0, ans=0.125 2024-08-12 03:42:18,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1441720.0, ans=0.2 2024-08-12 03:42:19,936 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-12 03:42:27,332 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13750, loss[loss=0.1333, beats_loss=0.008796, ecapa_loss=0.0001921, whisper_loss=0.1226, over 22887.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01111, ecapa_loss=0.0001834, whisper_loss=0.09298, over 3893053.56 frames. ], batch size: 89, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:42:35,072 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.16 vs. limit=15.0 2024-08-12 03:42:54,845 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-12 03:43:01,884 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-12 03:43:06,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1442020.0, ans=0.0 2024-08-12 03:43:08,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1442120.0, ans=0.2 2024-08-12 03:43:11,922 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.031e+01 2.531e+01 2.738e+01 3.278e+01 4.185e+01, threshold=5.475e+01, percent-clipped=0.0 2024-08-12 03:43:20,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1442120.0, ans=0.0 2024-08-12 03:43:22,873 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 03:43:28,726 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 26 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-12 03:43:36,693 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-12 03:43:38,559 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13800, loss[loss=0.119, beats_loss=0.01116, ecapa_loss=0.0002103, whisper_loss=0.1058, over 22592.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01105, ecapa_loss=0.0001839, whisper_loss=0.09349, over 3896699.19 frames. ], batch size: 92, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:43:54,922 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-08-12 03:43:57,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1442420.0, ans=0.125 2024-08-12 03:43:58,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1442420.0, ans=0.0 2024-08-12 03:44:06,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1442520.0, ans=0.1 2024-08-12 03:44:22,842 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 03:44:29,420 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 03:44:51,525 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13850, loss[loss=0.07326, beats_loss=0.01647, ecapa_loss=0.0001716, whisper_loss=0.05507, over 14419.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01107, ecapa_loss=0.0001845, whisper_loss=0.09321, over 3892230.06 frames. ], batch size: 60, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:44:51,754 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 03:45:04,270 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-12 03:45:06,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1442920.0, ans=0.025 2024-08-12 03:45:22,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1443020.0, ans=0.1 2024-08-12 03:45:38,609 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.591e+01 3.040e+01 3.441e+01 5.923e+01, threshold=6.079e+01, percent-clipped=1.0 2024-08-12 03:45:40,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1443120.0, ans=0.95 2024-08-12 03:46:01,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1443220.0, ans=0.0 2024-08-12 03:46:04,004 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13900, loss[loss=0.1174, beats_loss=0.01015, ecapa_loss=0.0002292, whisper_loss=0.1049, over 23356.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01098, ecapa_loss=0.0001858, whisper_loss=0.09397, over 3878468.10 frames. ], batch size: 92, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:46:05,585 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 03:46:06,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1443320.0, ans=0.125 2024-08-12 03:46:27,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1443420.0, ans=0.125 2024-08-12 03:46:31,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1443520.0, ans=0.125 2024-08-12 03:46:52,617 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.873e-01 2024-08-12 03:47:01,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1443720.0, ans=0.1 2024-08-12 03:47:13,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1443820.0, ans=0.125 2024-08-12 03:47:14,613 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 13950, loss[loss=0.1049, beats_loss=0.01024, ecapa_loss=0.0001812, whisper_loss=0.09281, over 17849.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01103, ecapa_loss=0.000187, whisper_loss=0.09334, over 3869592.25 frames. ], batch size: 70, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:47:16,265 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 03:47:19,490 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.00 vs. limit=15.0 2024-08-12 03:47:23,632 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.64 vs. limit=15.0 2024-08-12 03:47:31,570 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 03:47:41,867 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 03:47:48,630 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 03:47:50,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1444020.0, ans=0.125 2024-08-12 03:47:55,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1444120.0, ans=0.125 2024-08-12 03:47:59,099 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.550e+01 2.827e+01 3.293e+01 5.052e+01, threshold=5.654e+01, percent-clipped=0.0 2024-08-12 03:48:05,936 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 03:48:19,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1444220.0, ans=0.125 2024-08-12 03:48:24,485 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 14000, loss[loss=0.1068, beats_loss=0.01012, ecapa_loss=0.0001941, whisper_loss=0.0947, over 17161.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0111, ecapa_loss=0.0001849, whisper_loss=0.09299, over 3885726.24 frames. ], batch size: 69, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:48:26,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1444320.0, ans=0.125 2024-08-12 03:48:30,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1444320.0, ans=0.5 2024-08-12 03:48:37,671 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.676e-01 2024-08-12 03:48:52,526 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2024-08-12 03:49:05,647 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2024-08-12 03:49:09,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1444620.0, ans=0.125 2024-08-12 03:49:24,920 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 03:49:32,416 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 03:49:34,779 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 14050, loss[loss=0.1075, beats_loss=0.01066, ecapa_loss=0.0001765, whisper_loss=0.09503, over 21933.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01106, ecapa_loss=0.0001844, whisper_loss=0.09296, over 3870192.35 frames. ], batch size: 89, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:49:35,722 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2024-08-12 03:49:36,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1444820.0, ans=0.125 2024-08-12 03:49:36,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1444820.0, ans=0.125 2024-08-12 03:49:39,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1444820.0, ans=0.125 2024-08-12 03:49:45,464 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2024-08-12 03:49:46,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1444820.0, ans=0.125 2024-08-12 03:50:07,171 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 27 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-12 03:50:11,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1445020.0, ans=0.2 2024-08-12 03:50:19,574 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.615e+01 2.934e+01 3.537e+01 1.110e+02, threshold=5.868e+01, percent-clipped=2.0 2024-08-12 03:50:44,821 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 14100, loss[loss=0.1177, beats_loss=0.009827, ecapa_loss=0.0002019, whisper_loss=0.1058, over 22952.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01113, ecapa_loss=0.000183, whisper_loss=0.09282, over 3889122.42 frames. ], batch size: 92, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:51:04,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1445420.0, ans=10.0 2024-08-12 03:51:15,669 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.24 vs. limit=15.0 2024-08-12 03:51:18,030 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 03:51:41,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1445720.0, ans=0.04949747468305833 2024-08-12 03:51:48,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1445720.0, ans=0.1 2024-08-12 03:51:53,393 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 14150, loss[loss=0.1048, beats_loss=0.01048, ecapa_loss=0.0002146, whisper_loss=0.09215, over 20303.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01111, ecapa_loss=0.0001823, whisper_loss=0.09298, over 3876547.61 frames. ], batch size: 88, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:52:00,057 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 03:52:03,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1445820.0, ans=0.0 2024-08-12 03:52:11,432 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 03:52:19,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1446020.0, ans=0.1 2024-08-12 03:52:25,096 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 03:52:27,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1446020.0, ans=0.125 2024-08-12 03:52:30,176 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 03:52:36,799 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.480e+01 2.708e+01 3.118e+01 5.988e+01, threshold=5.416e+01, percent-clipped=1.0 2024-08-12 03:52:54,299 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 03:52:54,688 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.55 vs. limit=10.0 2024-08-12 03:52:56,526 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 03:53:02,289 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 14200, loss[loss=0.08854, beats_loss=0.01262, ecapa_loss=0.0002077, whisper_loss=0.07384, over 19741.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01115, ecapa_loss=0.0001805, whisper_loss=0.09258, over 3889626.35 frames. ], batch size: 84, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:53:04,256 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 03:53:06,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1446320.0, ans=0.0 2024-08-12 03:53:09,993 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-12 03:53:14,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2024-08-12 03:53:23,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1446420.0, ans=15.0 2024-08-12 03:53:30,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1446520.0, ans=0.0 2024-08-12 03:53:34,198 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 03:53:51,052 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 03:54:12,791 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 14250, loss[loss=0.1049, beats_loss=0.01349, ecapa_loss=0.000181, whisper_loss=0.08961, over 21262.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01116, ecapa_loss=0.0001802, whisper_loss=0.09225, over 3909968.17 frames. ], batch size: 87, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:54:18,709 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-12 03:54:31,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1446920.0, ans=0.1 2024-08-12 03:54:49,475 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.312e+02 2024-08-12 03:54:53,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1447020.0, ans=0.125 2024-08-12 03:54:56,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1447120.0, ans=0.125 2024-08-12 03:54:58,567 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.831e+01 3.136e+01 3.486e+01 5.154e+01, threshold=6.272e+01, percent-clipped=0.0 2024-08-12 03:55:02,495 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.31 vs. limit=15.0 2024-08-12 03:55:03,151 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-12 03:55:11,409 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.28 vs. limit=10.0 2024-08-12 03:55:15,931 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 03:55:23,950 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 14300, loss[loss=0.1342, beats_loss=0.009348, ecapa_loss=0.0002158, whisper_loss=0.1227, over 23821.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01117, ecapa_loss=0.0001804, whisper_loss=0.09198, over 3938935.14 frames. ], batch size: 89, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:55:28,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1447320.0, ans=0.2 2024-08-12 03:55:31,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1447320.0, ans=0.125 2024-08-12 03:55:35,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1447320.0, ans=0.1 2024-08-12 03:55:46,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1447420.0, ans=0.125 2024-08-12 03:55:49,313 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 03:55:55,588 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 03:56:17,337 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 03:56:18,062 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2024-08-12 03:56:20,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1447720.0, ans=0.1 2024-08-12 03:56:27,987 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 03:56:32,037 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 14350, loss[loss=0.09146, beats_loss=0.01422, ecapa_loss=0.0001459, whisper_loss=0.07578, over 21230.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01117, ecapa_loss=0.0001805, whisper_loss=0.09155, over 3918436.09 frames. ], batch size: 87, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:56:32,182 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-12 03:56:50,223 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-12 03:56:53,411 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.38 vs. limit=12.0 2024-08-12 03:57:06,161 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 03:57:17,904 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.126e+01 2.654e+01 2.989e+01 3.360e+01 6.544e+01, threshold=5.979e+01, percent-clipped=1.0 2024-08-12 03:57:25,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1448120.0, ans=0.1 2024-08-12 03:57:43,246 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 14400, loss[loss=0.1168, beats_loss=0.00736, ecapa_loss=0.0002034, whisper_loss=0.1074, over 19252.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01113, ecapa_loss=0.0001818, whisper_loss=0.09173, over 3906072.86 frames. ], batch size: 75, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:57:47,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1448320.0, ans=0.0 2024-08-12 03:57:49,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1448320.0, ans=0.125 2024-08-12 03:57:51,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1448320.0, ans=0.125 2024-08-12 03:58:20,799 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2024-08-12 03:58:27,971 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.53 vs. limit=22.5 2024-08-12 03:58:30,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1448620.0, ans=0.035 2024-08-12 03:58:31,727 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.40 vs. limit=12.0 2024-08-12 03:58:39,828 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 03:58:40,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1448720.0, ans=0.07 2024-08-12 03:58:43,820 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 03:58:52,023 INFO [train_multi_KD3.py:1116] (2/4) Epoch 10, batch 14450, loss[loss=0.1229, beats_loss=0.006187, ecapa_loss=0.0001962, whisper_loss=0.1148, over 14251.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01115, ecapa_loss=0.000182, whisper_loss=0.09155, over 3855519.18 frames. ], batch size: 54, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:58:59,347 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.188e-03 2024-08-12 03:59:06,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1448920.0, ans=0.125 2024-08-12 03:59:10,479 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 03:59:15,726 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 03:59:15,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1448920.0, ans=0.0 2024-08-12 03:59:16,995 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 03:59:34,947 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.542e+01 2.850e+01 3.301e+01 1.207e+02, threshold=5.700e+01, percent-clipped=1.0 2024-08-12 03:59:48,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1449220.0, ans=0.0 2024-08-12 04:00:35,398 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 0, loss[loss=0.1071, beats_loss=0.009313, ecapa_loss=0.0001714, whisper_loss=0.09612, over 18533.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.009313, ecapa_loss=0.0001714, whisper_loss=0.09612, over 18533.00 frames. ], batch size: 69, lr: 5.88e-03, grad_scale: 1.152921504606847e+18 2024-08-12 04:00:35,398 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 04:01:15,760 INFO [train_multi_KD3.py:1149] (2/4) Epoch 11, validation on ASR_libri: loss=0.2556, beats_loss=0, ecapa_loss=0.0005978, whisper_loss=0.2496, over 922467.00 frames. 2024-08-12 04:01:31,150 INFO [train_multi_KD3.py:1149] (2/4) Epoch 11, validation on SV_voxceleb1: loss=0.004953, beats_loss=0, ecapa_loss=0.0004953, whisper_loss=0, over 939242.00 frames. 2024-08-12 04:03:27,123 INFO [train_multi_KD3.py:1149] (2/4) Epoch 11, validation on AT_audioset: loss=0.02449, beats_loss=0.02449, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 04:03:27,133 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 04:03:46,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1449260.0, ans=0.1 2024-08-12 04:03:46,971 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.41 vs. limit=15.0 2024-08-12 04:04:05,776 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 04:04:06,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1449360.0, ans=0.125 2024-08-12 04:04:24,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1449460.0, ans=0.1 2024-08-12 04:05:12,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1449660.0, ans=0.05 2024-08-12 04:05:12,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1449660.0, ans=0.125 2024-08-12 04:05:13,047 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.329e+00 2024-08-12 04:05:15,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1449660.0, ans=0.125 2024-08-12 04:05:22,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1449660.0, ans=0.125 2024-08-12 04:05:33,357 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 50, loss[loss=0.08849, beats_loss=0.01082, ecapa_loss=0.0002107, whisper_loss=0.07556, over 20397.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001888, whisper_loss=0.09063, over 900341.98 frames. ], batch size: 83, lr: 5.88e-03, grad_scale: 1.152921504606847e+18 2024-08-12 04:06:27,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1449960.0, ans=0.2 2024-08-12 04:06:37,407 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 04:06:42,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=1449960.0, ans=0.1 2024-08-12 04:06:42,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1449960.0, ans=0.0 2024-08-12 04:06:43,106 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.48 vs. limit=15.0 2024-08-12 04:07:08,555 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.147e+01 2.961e+01 3.212e+01 3.624e+01 5.944e+01, threshold=6.424e+01, percent-clipped=1.0 2024-08-12 04:07:09,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1450160.0, ans=0.125 2024-08-12 04:07:30,242 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 100, loss[loss=0.09185, beats_loss=0.01149, ecapa_loss=0.0001842, whisper_loss=0.07852, over 21118.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01029, ecapa_loss=0.0001886, whisper_loss=0.09155, over 1531800.85 frames. ], batch size: 85, lr: 5.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 04:08:08,326 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 04:08:20,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1450360.0, ans=0.125 2024-08-12 04:08:46,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1450460.0, ans=0.125 2024-08-12 04:09:28,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1450560.0, ans=0.2 2024-08-12 04:09:34,317 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.74 vs. limit=12.0 2024-08-12 04:09:51,494 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.76 vs. limit=22.5 2024-08-12 04:09:55,302 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 150, loss[loss=0.09983, beats_loss=0.01034, ecapa_loss=0.0001918, whisper_loss=0.08757, over 22360.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01043, ecapa_loss=0.000186, whisper_loss=0.09195, over 2051529.28 frames. ], batch size: 89, lr: 5.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 04:10:10,188 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 04:10:39,496 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-12 04:10:41,621 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2024-08-12 04:10:58,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1450960.0, ans=0.2 2024-08-12 04:11:13,854 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 04:11:24,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1451060.0, ans=0.0 2024-08-12 04:11:38,105 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.724e+01 3.107e+01 3.626e+01 6.235e+01, threshold=6.215e+01, percent-clipped=0.0 2024-08-12 04:11:42,020 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 04:11:54,005 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-12 04:12:04,625 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 200, loss[loss=0.08912, beats_loss=0.01403, ecapa_loss=0.0001411, whisper_loss=0.07367, over 22936.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01047, ecapa_loss=0.0001863, whisper_loss=0.09244, over 2430675.33 frames. ], batch size: 92, lr: 5.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 04:12:25,762 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 14 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 04:12:31,254 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 04:12:32,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1451360.0, ans=0.025 2024-08-12 04:13:03,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1451460.0, ans=0.125 2024-08-12 04:13:27,318 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 27 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 04:13:48,572 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-12 04:13:50,089 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 04:13:50,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1451660.0, ans=0.125 2024-08-12 04:14:04,534 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 250, loss[loss=0.09503, beats_loss=0.01201, ecapa_loss=0.0001496, whisper_loss=0.08153, over 22318.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01038, ecapa_loss=0.0001864, whisper_loss=0.09341, over 2748354.42 frames. ], batch size: 87, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:14:23,027 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 04:14:25,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1451860.0, ans=0.125 2024-08-12 04:14:26,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1451860.0, ans=0.125 2024-08-12 04:14:34,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1451860.0, ans=0.1 2024-08-12 04:14:40,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1451860.0, ans=0.125 2024-08-12 04:14:50,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1451960.0, ans=0.0 2024-08-12 04:15:20,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1452060.0, ans=0.0 2024-08-12 04:15:22,828 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 04:15:26,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1452060.0, ans=0.125 2024-08-12 04:15:41,521 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.465e+01 2.658e+01 3.015e+01 5.855e+01, threshold=5.316e+01, percent-clipped=0.0 2024-08-12 04:15:43,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1452160.0, ans=0.125 2024-08-12 04:15:50,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1452160.0, ans=0.125 2024-08-12 04:15:58,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1452160.0, ans=0.125 2024-08-12 04:16:03,582 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 300, loss[loss=0.121, beats_loss=0.01095, ecapa_loss=0.0002018, whisper_loss=0.1081, over 19202.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01062, ecapa_loss=0.0001838, whisper_loss=0.0936, over 3006772.53 frames. ], batch size: 76, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:16:21,336 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 04:16:27,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1452360.0, ans=0.04949747468305833 2024-08-12 04:16:32,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1452460.0, ans=0.1 2024-08-12 04:16:35,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1452460.0, ans=0.125 2024-08-12 04:16:55,088 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 04:16:55,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1452560.0, ans=0.125 2024-08-12 04:17:03,989 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.63 vs. limit=22.5 2024-08-12 04:17:14,695 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 350, loss[loss=0.1181, beats_loss=0.01016, ecapa_loss=0.0001649, whisper_loss=0.1063, over 22990.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01062, ecapa_loss=0.0001844, whisper_loss=0.09291, over 3184124.04 frames. ], batch size: 91, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:17:20,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1452760.0, ans=0.2 2024-08-12 04:17:37,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1452860.0, ans=0.125 2024-08-12 04:17:44,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1452960.0, ans=0.0 2024-08-12 04:17:58,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1453060.0, ans=0.125 2024-08-12 04:18:05,517 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 04:18:10,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1453060.0, ans=0.0 2024-08-12 04:18:12,765 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 04:18:15,707 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.037e+01 2.542e+01 2.799e+01 3.205e+01 6.505e+01, threshold=5.597e+01, percent-clipped=2.0 2024-08-12 04:18:17,330 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 04:18:18,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1453160.0, ans=0.125 2024-08-12 04:18:20,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1453160.0, ans=0.0 2024-08-12 04:18:28,534 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 400, loss[loss=0.1132, beats_loss=0.009116, ecapa_loss=0.000188, whisper_loss=0.1022, over 18914.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01071, ecapa_loss=0.0001832, whisper_loss=0.0925, over 3328064.84 frames. ], batch size: 74, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:18:30,964 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.44 vs. limit=12.0 2024-08-12 04:18:36,704 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.448e+00 2024-08-12 04:18:37,187 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.74 vs. limit=6.0 2024-08-12 04:18:51,849 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 31 from LS+wenet, 11 from Vox, 40 fro AS 2024-08-12 04:19:06,390 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 04:19:06,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1453460.0, ans=0.2 2024-08-12 04:19:40,541 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 450, loss[loss=0.0881, beats_loss=0.01579, ecapa_loss=0.0001345, whisper_loss=0.07097, over 18349.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01081, ecapa_loss=0.0001815, whisper_loss=0.0916, over 3436134.63 frames. ], batch size: 74, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:19:41,402 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-08-12 04:19:52,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1453760.0, ans=0.125 2024-08-12 04:20:11,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1453960.0, ans=0.125 2024-08-12 04:20:25,721 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 28 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 04:20:26,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1454060.0, ans=0.125 2024-08-12 04:20:27,498 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.30 vs. limit=15.0 2024-08-12 04:20:41,369 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+01 2.543e+01 2.883e+01 3.316e+01 4.776e+01, threshold=5.767e+01, percent-clipped=0.0 2024-08-12 04:20:54,538 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 500, loss[loss=0.09408, beats_loss=0.01113, ecapa_loss=0.0001814, whisper_loss=0.08113, over 22373.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01074, ecapa_loss=0.0001802, whisper_loss=0.0919, over 3514946.61 frames. ], batch size: 88, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:21:11,901 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 04:21:16,387 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 04:21:38,467 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.27 vs. limit=10.0 2024-08-12 04:21:41,532 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=18.87 vs. limit=15.0 2024-08-12 04:21:42,135 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-12 04:21:45,041 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-12 04:21:59,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1454660.0, ans=0.2 2024-08-12 04:22:09,085 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 550, loss[loss=0.122, beats_loss=0.01021, ecapa_loss=0.000142, whisper_loss=0.1104, over 23386.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01086, ecapa_loss=0.0001774, whisper_loss=0.09228, over 3596352.95 frames. ], batch size: 89, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:22:24,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1454860.0, ans=0.025 2024-08-12 04:22:25,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1454860.0, ans=0.125 2024-08-12 04:22:27,904 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 04:22:40,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1454960.0, ans=0.125 2024-08-12 04:22:51,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1455060.0, ans=0.2 2024-08-12 04:22:52,614 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 04:23:04,428 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 04:23:06,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1455160.0, ans=0.1 2024-08-12 04:23:08,367 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.603e+01 2.842e+01 3.155e+01 5.740e+01, threshold=5.685e+01, percent-clipped=0.0 2024-08-12 04:23:13,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1455160.0, ans=0.125 2024-08-12 04:23:21,967 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 600, loss[loss=0.096, beats_loss=0.01154, ecapa_loss=0.0002056, whisper_loss=0.08241, over 21480.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0109, ecapa_loss=0.0001769, whisper_loss=0.09195, over 3650458.59 frames. ], batch size: 90, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:23:31,538 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-12 04:23:34,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1455260.0, ans=0.125 2024-08-12 04:23:40,282 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 04:23:47,129 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 04:23:50,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1455460.0, ans=0.125 2024-08-12 04:23:59,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1455460.0, ans=0.125 2024-08-12 04:24:07,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1455560.0, ans=0.125 2024-08-12 04:24:12,525 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2024-08-12 04:24:24,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1455660.0, ans=0.125 2024-08-12 04:24:25,842 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-12 04:24:29,612 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-12 04:24:35,345 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 650, loss[loss=0.1191, beats_loss=0.01122, ecapa_loss=0.000147, whisper_loss=0.1064, over 22044.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01085, ecapa_loss=0.0001767, whisper_loss=0.09252, over 3705825.31 frames. ], batch size: 84, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:24:35,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1455760.0, ans=0.0 2024-08-12 04:24:48,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1455860.0, ans=0.125 2024-08-12 04:25:03,550 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.29 vs. limit=15.0 2024-08-12 04:25:16,767 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 04:25:33,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1456160.0, ans=0.125 2024-08-12 04:25:35,462 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.475e+01 2.766e+01 3.282e+01 4.630e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 04:25:41,680 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-12 04:25:48,987 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 700, loss[loss=0.127, beats_loss=0.01381, ecapa_loss=0.0001676, whisper_loss=0.1115, over 14797.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01093, ecapa_loss=0.000177, whisper_loss=0.09205, over 3734088.89 frames. ], batch size: 59, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:25:55,656 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-12 04:26:07,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1456360.0, ans=0.0 2024-08-12 04:26:08,770 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 04:26:09,598 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.43 vs. limit=22.5 2024-08-12 04:26:17,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1456360.0, ans=0.1 2024-08-12 04:26:38,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1456560.0, ans=0.0 2024-08-12 04:26:41,833 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.63 vs. limit=15.0 2024-08-12 04:26:51,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1456660.0, ans=0.0 2024-08-12 04:27:01,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1456660.0, ans=0.2 2024-08-12 04:27:01,296 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.62 vs. limit=15.0 2024-08-12 04:27:07,394 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 750, loss[loss=0.1056, beats_loss=0.01152, ecapa_loss=0.0001603, whisper_loss=0.09245, over 21090.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01098, ecapa_loss=0.0001767, whisper_loss=0.09166, over 3758079.67 frames. ], batch size: 82, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:27:20,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1456760.0, ans=0.125 2024-08-12 04:27:37,904 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 04:27:58,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1457060.0, ans=0.1 2024-08-12 04:28:02,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1457060.0, ans=0.025 2024-08-12 04:28:16,745 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.543e+01 2.919e+01 3.268e+01 8.785e+01, threshold=5.838e+01, percent-clipped=1.0 2024-08-12 04:28:25,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1457160.0, ans=0.0 2024-08-12 04:28:32,370 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 800, loss[loss=0.1201, beats_loss=0.00943, ecapa_loss=0.0001971, whisper_loss=0.1087, over 16000.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01099, ecapa_loss=0.0001761, whisper_loss=0.09166, over 3809142.10 frames. ], batch size: 63, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:28:36,860 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=15.0 2024-08-12 04:28:40,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1457260.0, ans=0.125 2024-08-12 04:28:44,205 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-12 04:28:49,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1457360.0, ans=0.07 2024-08-12 04:28:54,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1457360.0, ans=0.0 2024-08-12 04:28:56,133 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 04:29:29,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1457560.0, ans=0.125 2024-08-12 04:29:31,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1457560.0, ans=0.125 2024-08-12 04:29:43,152 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 04:29:52,609 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 850, loss[loss=0.07914, beats_loss=0.01187, ecapa_loss=0.0001944, whisper_loss=0.06532, over 19026.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01094, ecapa_loss=0.0001765, whisper_loss=0.0917, over 3812102.15 frames. ], batch size: 76, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:30:01,437 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 24 from Vox, 16 fro AS 2024-08-12 04:30:10,767 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-12 04:30:12,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1457860.0, ans=0.2 2024-08-12 04:30:26,956 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 04:30:29,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1457960.0, ans=0.125 2024-08-12 04:30:56,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1458160.0, ans=0.05 2024-08-12 04:30:57,273 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.636e+01 2.987e+01 3.471e+01 7.869e+01, threshold=5.974e+01, percent-clipped=5.0 2024-08-12 04:31:05,549 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-12 04:31:10,946 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 900, loss[loss=0.0812, beats_loss=0.01347, ecapa_loss=0.0001826, whisper_loss=0.0659, over 16431.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01103, ecapa_loss=0.0001752, whisper_loss=0.09113, over 3773143.99 frames. ], batch size: 69, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:31:11,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1458260.0, ans=0.125 2024-08-12 04:31:16,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1458260.0, ans=0.0 2024-08-12 04:31:24,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1458260.0, ans=0.2 2024-08-12 04:32:12,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1458560.0, ans=0.125 2024-08-12 04:32:22,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1458660.0, ans=0.125 2024-08-12 04:32:27,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1458660.0, ans=0.125 2024-08-12 04:32:31,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1458660.0, ans=0.1 2024-08-12 04:32:34,202 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 950, loss[loss=0.1181, beats_loss=0.009622, ecapa_loss=0.0001833, whisper_loss=0.1066, over 23381.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01099, ecapa_loss=0.0001751, whisper_loss=0.09118, over 3807314.89 frames. ], batch size: 89, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:32:44,235 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 04:32:51,873 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.11 vs. limit=10.0 2024-08-12 04:33:02,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1458860.0, ans=0.125 2024-08-12 04:33:24,420 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-12 04:33:44,599 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.653e+01 2.939e+01 3.386e+01 4.997e+01, threshold=5.879e+01, percent-clipped=0.0 2024-08-12 04:33:50,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1459160.0, ans=0.2 2024-08-12 04:34:00,934 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1000, loss[loss=0.1187, beats_loss=0.005992, ecapa_loss=0.000194, whisper_loss=0.1108, over 14843.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01096, ecapa_loss=0.0001743, whisper_loss=0.09095, over 3802763.05 frames. ], batch size: 55, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:34:12,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1459260.0, ans=0.1 2024-08-12 04:34:16,465 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 04:34:41,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1459460.0, ans=0.125 2024-08-12 04:34:43,679 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 12 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 04:35:12,937 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 04:35:13,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1459660.0, ans=0.125 2024-08-12 04:35:15,323 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 04:35:15,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1459660.0, ans=0.1 2024-08-12 04:35:21,613 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1050, loss[loss=0.09719, beats_loss=0.01134, ecapa_loss=0.0001511, whisper_loss=0.08434, over 17667.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01097, ecapa_loss=0.0001743, whisper_loss=0.09146, over 3830770.29 frames. ], batch size: 65, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:35:34,630 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-12 04:35:41,979 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 04:35:43,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1459860.0, ans=10.0 2024-08-12 04:35:56,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1459960.0, ans=0.125 2024-08-12 04:36:01,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1459960.0, ans=0.2 2024-08-12 04:36:33,432 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.762e+01 2.974e+01 3.480e+01 4.829e+01, threshold=5.949e+01, percent-clipped=0.0 2024-08-12 04:36:48,809 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1100, loss[loss=0.1241, beats_loss=0.01059, ecapa_loss=0.0001651, whisper_loss=0.1119, over 23186.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01092, ecapa_loss=0.0001736, whisper_loss=0.0918, over 3811719.76 frames. ], batch size: 88, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:36:48,935 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 04:37:09,981 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 04:37:10,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1460360.0, ans=0.0 2024-08-12 04:37:12,090 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 04:37:14,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1460360.0, ans=0.125 2024-08-12 04:37:17,299 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 04:37:38,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1460560.0, ans=0.0 2024-08-12 04:37:39,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1460560.0, ans=0.125 2024-08-12 04:37:51,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1460560.0, ans=0.0 2024-08-12 04:38:07,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1460660.0, ans=10.0 2024-08-12 04:38:13,938 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1150, loss[loss=0.09976, beats_loss=0.01427, ecapa_loss=0.0001147, whisper_loss=0.08434, over 16734.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01105, ecapa_loss=0.0001723, whisper_loss=0.09156, over 3854140.91 frames. ], batch size: 63, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:38:17,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1460760.0, ans=0.0 2024-08-12 04:38:24,102 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 04:38:29,420 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 04:39:08,266 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.84 vs. limit=12.0 2024-08-12 04:39:12,372 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 04:39:13,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1461060.0, ans=0.0 2024-08-12 04:39:19,523 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.588e+01 2.774e+01 3.143e+01 5.777e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-12 04:39:27,629 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-12 04:39:33,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1200, loss[loss=0.09864, beats_loss=0.008104, ecapa_loss=0.000177, whisper_loss=0.08877, over 14403.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01105, ecapa_loss=0.0001726, whisper_loss=0.09093, over 3827702.18 frames. ], batch size: 56, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:39:43,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1461260.0, ans=0.125 2024-08-12 04:40:25,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1461560.0, ans=0.5 2024-08-12 04:40:37,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1461560.0, ans=0.125 2024-08-12 04:40:42,612 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 04:40:53,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1461660.0, ans=0.05 2024-08-12 04:40:57,456 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1250, loss[loss=0.1072, beats_loss=0.0112, ecapa_loss=0.0001559, whisper_loss=0.09439, over 20216.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01117, ecapa_loss=0.0001724, whisper_loss=0.09047, over 3841826.92 frames. ], batch size: 79, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:41:14,815 INFO [train_multi_KD3.py:844] (2/4) A total of 97 cuts. 29 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-12 04:41:19,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1461860.0, ans=0.05 2024-08-12 04:41:19,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1461860.0, ans=0.1 2024-08-12 04:41:36,408 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.36 vs. limit=15.0 2024-08-12 04:41:49,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1462060.0, ans=0.125 2024-08-12 04:42:02,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1462060.0, ans=10.0 2024-08-12 04:42:06,773 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 04:42:08,584 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.564e+01 2.833e+01 3.209e+01 5.019e+01, threshold=5.666e+01, percent-clipped=0.0 2024-08-12 04:42:16,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1462160.0, ans=0.125 2024-08-12 04:42:24,031 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1300, loss[loss=0.1096, beats_loss=0.01092, ecapa_loss=0.0001526, whisper_loss=0.09714, over 23356.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01113, ecapa_loss=0.0001728, whisper_loss=0.09116, over 3874824.16 frames. ], batch size: 93, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:42:26,250 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-12 04:42:32,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1462260.0, ans=0.125 2024-08-12 04:42:34,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1462260.0, ans=0.125 2024-08-12 04:42:36,738 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 04:42:40,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1462360.0, ans=0.125 2024-08-12 04:42:59,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.93 vs. limit=15.0 2024-08-12 04:43:02,393 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.94 vs. limit=15.0 2024-08-12 04:43:16,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1462560.0, ans=0.0 2024-08-12 04:43:35,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1462660.0, ans=0.125 2024-08-12 04:43:35,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1462660.0, ans=0.125 2024-08-12 04:43:46,696 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1350, loss[loss=0.09747, beats_loss=0.01209, ecapa_loss=0.0001625, whisper_loss=0.08375, over 19127.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01109, ecapa_loss=0.0001728, whisper_loss=0.0913, over 3865848.75 frames. ], batch size: 74, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:43:49,131 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 04:43:49,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1462760.0, ans=0.0 2024-08-12 04:43:54,224 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 04:44:06,997 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.90 vs. limit=10.0 2024-08-12 04:44:12,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1462860.0, ans=0.1 2024-08-12 04:44:14,271 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-12 04:44:16,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1462860.0, ans=0.0 2024-08-12 04:44:29,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1462960.0, ans=0.125 2024-08-12 04:44:31,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1462960.0, ans=0.125 2024-08-12 04:44:34,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1462960.0, ans=0.125 2024-08-12 04:44:50,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1463060.0, ans=0.0 2024-08-12 04:44:58,664 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.596e+01 2.848e+01 3.248e+01 6.741e+01, threshold=5.696e+01, percent-clipped=1.0 2024-08-12 04:45:11,612 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1400, loss[loss=0.09508, beats_loss=0.01181, ecapa_loss=0.0001318, whisper_loss=0.08196, over 15478.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01101, ecapa_loss=0.0001736, whisper_loss=0.09124, over 3845794.66 frames. ], batch size: 59, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:45:33,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1463360.0, ans=0.125 2024-08-12 04:45:46,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1463460.0, ans=0.0 2024-08-12 04:45:47,146 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-12 04:45:55,136 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-12 04:45:57,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1463460.0, ans=0.1 2024-08-12 04:46:01,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1463560.0, ans=0.0 2024-08-12 04:46:02,666 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-12 04:46:21,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1463660.0, ans=0.0 2024-08-12 04:46:22,862 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 04:46:59,690 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1450, loss[loss=0.112, beats_loss=0.0121, ecapa_loss=0.0001403, whisper_loss=0.09848, over 16020.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01097, ecapa_loss=0.0001736, whisper_loss=0.09147, over 3834436.21 frames. ], batch size: 63, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:47:03,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1463760.0, ans=0.2 2024-08-12 04:47:09,333 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.08 vs. limit=22.5 2024-08-12 04:47:18,377 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 04:47:20,003 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 04:47:55,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1464060.0, ans=0.125 2024-08-12 04:48:05,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1464160.0, ans=0.125 2024-08-12 04:48:05,909 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.411e+01 2.800e+01 3.262e+01 9.547e+01, threshold=5.600e+01, percent-clipped=2.0 2024-08-12 04:48:14,977 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-12 04:48:15,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1464160.0, ans=0.125 2024-08-12 04:48:20,863 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1500, loss[loss=0.09492, beats_loss=0.01169, ecapa_loss=0.0001452, whisper_loss=0.08178, over 19972.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01101, ecapa_loss=0.0001739, whisper_loss=0.0911, over 3835011.27 frames. ], batch size: 74, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:48:34,016 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-12 04:48:39,304 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 04:48:58,617 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 04:49:01,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1464460.0, ans=0.0 2024-08-12 04:49:07,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1464560.0, ans=0.125 2024-08-12 04:49:21,250 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 27 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 04:49:26,781 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 04:49:30,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1464660.0, ans=0.95 2024-08-12 04:49:40,887 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1550, loss[loss=0.08385, beats_loss=0.01261, ecapa_loss=0.0001755, whisper_loss=0.06949, over 17776.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01097, ecapa_loss=0.0001724, whisper_loss=0.09133, over 3817041.03 frames. ], batch size: 74, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:50:13,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1464960.0, ans=0.0 2024-08-12 04:50:14,611 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-12 04:50:19,057 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 04:50:45,175 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.381e+01 2.640e+01 3.042e+01 4.916e+01, threshold=5.281e+01, percent-clipped=0.0 2024-08-12 04:50:46,175 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2024-08-12 04:50:56,279 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 04:50:58,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1465260.0, ans=0.1 2024-08-12 04:50:59,570 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1600, loss[loss=0.1042, beats_loss=0.01165, ecapa_loss=0.0001745, whisper_loss=0.09083, over 21496.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01102, ecapa_loss=0.0001724, whisper_loss=0.09099, over 3815361.01 frames. ], batch size: 86, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:51:03,096 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 23 from LS+wenet, 9 from Vox, 24 fro AS 2024-08-12 04:51:05,742 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-12 04:51:28,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1465360.0, ans=0.1 2024-08-12 04:51:32,355 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-12 04:51:46,643 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 04:51:54,591 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 04:52:06,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1465660.0, ans=6.0 2024-08-12 04:52:16,736 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1650, loss[loss=0.07658, beats_loss=0.01497, ecapa_loss=0.000127, whisper_loss=0.06034, over 15583.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01109, ecapa_loss=0.000172, whisper_loss=0.09072, over 3824393.61 frames. ], batch size: 62, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:52:26,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1465760.0, ans=0.0 2024-08-12 04:52:26,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1465760.0, ans=0.2 2024-08-12 04:52:44,328 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 04:53:03,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1466060.0, ans=0.125 2024-08-12 04:53:19,735 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.459e+01 2.653e+01 3.242e+01 4.506e+01, threshold=5.307e+01, percent-clipped=0.0 2024-08-12 04:53:24,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1466160.0, ans=0.125 2024-08-12 04:53:26,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1466160.0, ans=0.125 2024-08-12 04:53:29,124 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 04:53:33,384 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1700, loss[loss=0.1007, beats_loss=0.01166, ecapa_loss=0.0001451, whisper_loss=0.0876, over 17304.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01097, ecapa_loss=0.0001731, whisper_loss=0.09168, over 3834211.55 frames. ], batch size: 68, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:53:34,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1466260.0, ans=0.0 2024-08-12 04:53:40,029 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 35 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 04:53:57,305 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-12 04:54:01,027 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.87 vs. limit=12.0 2024-08-12 04:54:01,510 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 04:54:19,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1466560.0, ans=0.2 2024-08-12 04:54:22,415 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.20 vs. limit=15.0 2024-08-12 04:54:34,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1466660.0, ans=0.1 2024-08-12 04:54:41,030 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 04:54:50,307 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1750, loss[loss=0.1091, beats_loss=0.01257, ecapa_loss=0.0001893, whisper_loss=0.09459, over 18866.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01094, ecapa_loss=0.0001738, whisper_loss=0.09217, over 3846658.37 frames. ], batch size: 79, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:55:34,157 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 04:55:34,389 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.508e-03 2024-08-12 04:55:41,215 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.801e-02 2024-08-12 04:55:50,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1467060.0, ans=0.1 2024-08-12 04:55:53,972 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.424e+01 2.723e+01 3.040e+01 5.517e+01, threshold=5.446e+01, percent-clipped=1.0 2024-08-12 04:55:59,176 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 04:56:07,633 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1800, loss[loss=0.1161, beats_loss=0.009606, ecapa_loss=0.0001924, whisper_loss=0.1045, over 19984.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01089, ecapa_loss=0.0001729, whisper_loss=0.09287, over 3852444.81 frames. ], batch size: 81, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:56:21,974 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-12 04:56:33,777 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-12 04:56:35,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1467360.0, ans=0.125 2024-08-12 04:56:43,842 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 04:56:45,628 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 04:56:54,762 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 04:56:57,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1467560.0, ans=0.125 2024-08-12 04:57:04,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1467560.0, ans=0.2 2024-08-12 04:57:04,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1467560.0, ans=0.2 2024-08-12 04:57:12,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1467660.0, ans=0.125 2024-08-12 04:57:22,168 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 04:57:24,460 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1850, loss[loss=0.1248, beats_loss=0.01084, ecapa_loss=0.0001736, whisper_loss=0.1122, over 22931.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01088, ecapa_loss=0.000174, whisper_loss=0.0931, over 3861999.72 frames. ], batch size: 90, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:57:37,320 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 04:57:55,757 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-12 04:57:57,000 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 04:58:04,924 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 04:58:07,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1467960.0, ans=0.0 2024-08-12 04:58:15,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1468060.0, ans=0.125 2024-08-12 04:58:27,083 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.532e+01 2.817e+01 3.253e+01 1.073e+02, threshold=5.635e+01, percent-clipped=1.0 2024-08-12 04:58:29,727 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-12 04:58:37,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1468160.0, ans=0.0 2024-08-12 04:58:41,992 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1900, loss[loss=0.1064, beats_loss=0.01143, ecapa_loss=0.0001836, whisper_loss=0.0931, over 19682.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01091, ecapa_loss=0.0001742, whisper_loss=0.09248, over 3839686.19 frames. ], batch size: 80, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:58:57,328 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2024-08-12 04:59:01,356 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 04:59:01,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1468360.0, ans=0.0 2024-08-12 04:59:04,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1468360.0, ans=0.0 2024-08-12 04:59:06,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1468360.0, ans=0.125 2024-08-12 04:59:11,313 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.00 vs. limit=12.0 2024-08-12 04:59:22,529 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.29 vs. limit=15.0 2024-08-12 04:59:51,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1468660.0, ans=0.025 2024-08-12 04:59:53,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1468660.0, ans=0.025 2024-08-12 04:59:54,080 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-12 04:59:59,150 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 1950, loss[loss=0.09408, beats_loss=0.01076, ecapa_loss=0.0001587, whisper_loss=0.08174, over 18198.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01098, ecapa_loss=0.0001749, whisper_loss=0.09219, over 3854456.49 frames. ], batch size: 71, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:00:00,898 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-12 05:00:02,676 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-12 05:00:08,054 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 05:00:08,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1468760.0, ans=0.125 2024-08-12 05:00:24,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1468860.0, ans=0.125 2024-08-12 05:00:45,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1469060.0, ans=0.125 2024-08-12 05:00:56,257 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 05:01:01,822 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.456e+01 2.694e+01 2.989e+01 6.245e+01, threshold=5.388e+01, percent-clipped=1.0 2024-08-12 05:01:15,807 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2000, loss[loss=0.09831, beats_loss=0.01055, ecapa_loss=0.0001879, whisper_loss=0.08588, over 19919.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01098, ecapa_loss=0.000177, whisper_loss=0.09111, over 3820509.08 frames. ], batch size: 76, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:01:30,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1469360.0, ans=0.0 2024-08-12 05:01:37,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1469360.0, ans=0.125 2024-08-12 05:02:00,446 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 32 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 05:02:02,379 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 05:02:05,293 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 31 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-12 05:02:34,314 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2050, loss[loss=0.08273, beats_loss=0.01106, ecapa_loss=0.0001831, whisper_loss=0.06984, over 15518.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01096, ecapa_loss=0.0001788, whisper_loss=0.09195, over 3811040.92 frames. ], batch size: 65, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:03:11,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1469960.0, ans=0.1 2024-08-12 05:03:23,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1470060.0, ans=0.125 2024-08-12 05:03:31,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1470060.0, ans=0.125 2024-08-12 05:03:31,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1470060.0, ans=0.2 2024-08-12 05:03:37,127 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.542e+01 2.738e+01 3.129e+01 4.867e+01, threshold=5.477e+01, percent-clipped=0.0 2024-08-12 05:03:50,760 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2100, loss[loss=0.1019, beats_loss=0.01094, ecapa_loss=0.0001722, whisper_loss=0.08924, over 19842.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01098, ecapa_loss=0.000178, whisper_loss=0.09182, over 3807372.35 frames. ], batch size: 79, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:04:02,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1470260.0, ans=0.0 2024-08-12 05:04:03,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1470260.0, ans=0.2 2024-08-12 05:04:11,102 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=15.0 2024-08-12 05:04:17,892 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-12 05:05:07,970 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2150, loss[loss=0.09065, beats_loss=0.01066, ecapa_loss=0.0001635, whisper_loss=0.07836, over 14481.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01105, ecapa_loss=0.0001781, whisper_loss=0.09138, over 3820871.49 frames. ], batch size: 53, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:05:08,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1470760.0, ans=0.2 2024-08-12 05:05:16,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=1470760.0, ans=12.0 2024-08-12 05:05:30,989 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-12 05:05:46,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1470960.0, ans=0.1 2024-08-12 05:05:52,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1471060.0, ans=0.125 2024-08-12 05:05:52,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1471060.0, ans=0.125 2024-08-12 05:05:56,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1471060.0, ans=0.2 2024-08-12 05:06:07,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1471160.0, ans=0.0 2024-08-12 05:06:09,867 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.509e+01 2.893e+01 3.375e+01 5.887e+01, threshold=5.785e+01, percent-clipped=2.0 2024-08-12 05:06:14,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1471160.0, ans=0.0 2024-08-12 05:06:19,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1471160.0, ans=0.0 2024-08-12 05:06:19,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=1471160.0, ans=15.0 2024-08-12 05:06:23,035 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2200, loss[loss=0.0866, beats_loss=0.01076, ecapa_loss=0.0002046, whisper_loss=0.07379, over 17402.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01109, ecapa_loss=0.0001773, whisper_loss=0.09165, over 3827825.43 frames. ], batch size: 76, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:06:27,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-12 05:06:51,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1471360.0, ans=0.125 2024-08-12 05:07:02,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1471460.0, ans=0.0 2024-08-12 05:07:10,785 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-12 05:07:28,413 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-12 05:07:34,578 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.87 vs. limit=6.0 2024-08-12 05:07:41,056 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2250, loss[loss=0.09378, beats_loss=0.01185, ecapa_loss=0.0002057, whisper_loss=0.07988, over 16728.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01115, ecapa_loss=0.0001789, whisper_loss=0.09177, over 3825672.31 frames. ], batch size: 69, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:07:43,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1471760.0, ans=0.125 2024-08-12 05:07:55,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1471860.0, ans=0.125 2024-08-12 05:08:08,772 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 05:08:25,296 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 05:08:34,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1472060.0, ans=0.125 2024-08-12 05:08:39,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1472060.0, ans=0.09899494936611666 2024-08-12 05:08:41,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1472060.0, ans=0.09899494936611666 2024-08-12 05:08:54,234 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.613e+01 2.941e+01 3.406e+01 8.387e+01, threshold=5.883e+01, percent-clipped=3.0 2024-08-12 05:09:11,771 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2300, loss[loss=0.1224, beats_loss=0.01031, ecapa_loss=0.000153, whisper_loss=0.1105, over 15971.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0111, ecapa_loss=0.00018, whisper_loss=0.0923, over 3850946.74 frames. ], batch size: 58, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:09:15,357 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.10 vs. limit=22.5 2024-08-12 05:09:33,828 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.73 vs. limit=15.0 2024-08-12 05:09:38,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1472360.0, ans=0.125 2024-08-12 05:09:40,246 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-12 05:10:00,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1472460.0, ans=0.0 2024-08-12 05:10:18,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1472560.0, ans=0.2 2024-08-12 05:10:21,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1472560.0, ans=0.1 2024-08-12 05:10:35,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1472660.0, ans=15.0 2024-08-12 05:10:46,897 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2350, loss[loss=0.09923, beats_loss=0.01264, ecapa_loss=0.0001453, whisper_loss=0.08514, over 23809.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01107, ecapa_loss=0.0001803, whisper_loss=0.0929, over 3863374.21 frames. ], batch size: 95, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:10:56,970 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 05:11:04,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1472760.0, ans=0.125 2024-08-12 05:11:51,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1472960.0, ans=0.0 2024-08-12 05:12:06,835 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 05:12:18,875 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.614e+01 3.008e+01 3.445e+01 5.971e+01, threshold=6.017e+01, percent-clipped=1.0 2024-08-12 05:12:36,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1473260.0, ans=0.125 2024-08-12 05:12:37,552 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2400, loss[loss=0.1081, beats_loss=0.01166, ecapa_loss=0.000157, whisper_loss=0.09488, over 17739.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01104, ecapa_loss=0.00018, whisper_loss=0.09265, over 3849953.91 frames. ], batch size: 70, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:12:44,230 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 05:12:45,217 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.33 vs. limit=15.0 2024-08-12 05:12:59,280 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 05:13:01,725 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 05:13:04,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1473360.0, ans=0.0 2024-08-12 05:13:17,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1473460.0, ans=0.0 2024-08-12 05:13:19,342 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 05:13:52,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1473560.0, ans=0.125 2024-08-12 05:14:03,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1473660.0, ans=0.125 2024-08-12 05:14:04,569 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-12 05:14:12,668 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 05:14:20,436 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2450, loss[loss=0.09888, beats_loss=0.0108, ecapa_loss=0.0001421, whisper_loss=0.08666, over 24416.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01104, ecapa_loss=0.0001791, whisper_loss=0.0927, over 3856347.46 frames. ], batch size: 92, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:14:29,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1473760.0, ans=0.1 2024-08-12 05:14:41,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1473860.0, ans=0.1 2024-08-12 05:15:19,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1474060.0, ans=0.125 2024-08-12 05:15:24,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1474060.0, ans=0.125 2024-08-12 05:15:38,197 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.575e+01 2.893e+01 3.388e+01 4.265e+01, threshold=5.785e+01, percent-clipped=0.0 2024-08-12 05:15:42,795 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=12.0 2024-08-12 05:15:51,426 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2500, loss[loss=0.0935, beats_loss=0.01218, ecapa_loss=0.000167, whisper_loss=0.07965, over 16688.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01104, ecapa_loss=0.0001791, whisper_loss=0.09265, over 3833947.75 frames. ], batch size: 67, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:15:52,935 INFO [train_multi_KD3.py:844] (2/4) A total of 97 cuts. 28 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-12 05:16:01,466 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 05:16:18,088 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 05:16:38,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1474560.0, ans=0.2 2024-08-12 05:16:52,562 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 05:16:54,831 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2550, loss[loss=0.103, beats_loss=0.012, ecapa_loss=0.0001673, whisper_loss=0.0893, over 22153.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01103, ecapa_loss=0.0001809, whisper_loss=0.09254, over 3846480.33 frames. ], batch size: 90, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:16:58,658 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 05:17:00,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1474760.0, ans=0.125 2024-08-12 05:17:03,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1474760.0, ans=0.1 2024-08-12 05:17:28,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1474960.0, ans=0.125 2024-08-12 05:17:28,955 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 05:17:47,844 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+01 2.613e+01 2.908e+01 3.447e+01 1.061e+02, threshold=5.817e+01, percent-clipped=1.0 2024-08-12 05:17:49,517 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-12 05:17:59,222 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2600, loss[loss=0.1233, beats_loss=0.009126, ecapa_loss=0.0001932, whisper_loss=0.1122, over 22332.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01099, ecapa_loss=0.0001811, whisper_loss=0.0928, over 3826778.34 frames. ], batch size: 91, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:18:23,462 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 05:18:25,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1475460.0, ans=0.0 2024-08-12 05:18:34,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1475460.0, ans=0.1 2024-08-12 05:18:37,736 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 25 from Vox, 19 fro AS 2024-08-12 05:18:40,173 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 05:18:48,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1475560.0, ans=0.125 2024-08-12 05:18:48,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1475560.0, ans=0.125 2024-08-12 05:19:03,590 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2650, loss[loss=0.1075, beats_loss=0.00852, ecapa_loss=0.0001551, whisper_loss=0.09742, over 21662.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01101, ecapa_loss=0.0001808, whisper_loss=0.09258, over 3853505.93 frames. ], batch size: 80, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:19:12,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1475760.0, ans=0.95 2024-08-12 05:19:19,558 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 22 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 05:19:30,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1475960.0, ans=0.1 2024-08-12 05:19:56,920 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.506e+01 2.786e+01 3.189e+01 5.235e+01, threshold=5.572e+01, percent-clipped=0.0 2024-08-12 05:20:07,537 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 32 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-12 05:20:08,665 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2700, loss[loss=0.1184, beats_loss=0.01183, ecapa_loss=0.0001568, whisper_loss=0.1051, over 21546.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01111, ecapa_loss=0.0001795, whisper_loss=0.09177, over 3846564.55 frames. ], batch size: 85, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:20:30,479 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 05:20:35,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1476460.0, ans=0.125 2024-08-12 05:20:52,310 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 11 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-12 05:21:13,060 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2750, loss[loss=0.09431, beats_loss=0.01033, ecapa_loss=0.0002306, whisper_loss=0.08168, over 16990.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01112, ecapa_loss=0.000179, whisper_loss=0.0916, over 3843729.97 frames. ], batch size: 67, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:21:14,480 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-12 05:21:32,213 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=15.38 vs. limit=15.0 2024-08-12 05:21:34,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1476860.0, ans=0.125 2024-08-12 05:21:45,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1476960.0, ans=0.07 2024-08-12 05:21:49,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1476960.0, ans=0.0 2024-08-12 05:21:59,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1477060.0, ans=0.125 2024-08-12 05:22:01,842 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-12 05:22:05,355 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.574e+01 2.886e+01 3.333e+01 4.847e+01, threshold=5.772e+01, percent-clipped=0.0 2024-08-12 05:22:10,098 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.29 vs. limit=15.0 2024-08-12 05:22:17,139 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2800, loss[loss=0.1057, beats_loss=0.008821, ecapa_loss=0.0001844, whisper_loss=0.095, over 16222.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01112, ecapa_loss=0.000179, whisper_loss=0.09197, over 3856735.54 frames. ], batch size: 62, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:22:29,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1477360.0, ans=0.125 2024-08-12 05:22:36,554 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.22 vs. limit=15.0 2024-08-12 05:22:46,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1477460.0, ans=0.0 2024-08-12 05:22:54,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1477460.0, ans=0.125 2024-08-12 05:22:58,932 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 05:23:00,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1477560.0, ans=0.1 2024-08-12 05:23:10,309 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 05:23:20,534 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 17 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-12 05:23:25,585 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2850, loss[loss=0.1273, beats_loss=0.008884, ecapa_loss=0.0001734, whisper_loss=0.1167, over 24052.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.0111, ecapa_loss=0.0001787, whisper_loss=0.09259, over 3840793.96 frames. ], batch size: 91, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:23:37,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1477760.0, ans=0.0 2024-08-12 05:23:40,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1477860.0, ans=0.0 2024-08-12 05:23:43,412 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 19 from LS+wenet, 30 from Vox, 42 fro AS 2024-08-12 05:23:46,795 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 05:23:58,441 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 05:24:10,128 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.63 vs. limit=22.5 2024-08-12 05:24:30,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.606e+01 3.053e+01 3.517e+01 5.532e+01, threshold=6.106e+01, percent-clipped=0.0 2024-08-12 05:24:41,518 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.96 vs. limit=22.5 2024-08-12 05:24:44,732 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2900, loss[loss=0.1032, beats_loss=0.01078, ecapa_loss=0.0001963, whisper_loss=0.09044, over 21791.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01111, ecapa_loss=0.0001811, whisper_loss=0.09201, over 3845084.99 frames. ], batch size: 89, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:25:26,752 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 05:25:32,882 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 05:25:54,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1478760.0, ans=0.09899494936611666 2024-08-12 05:25:55,271 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 2950, loss[loss=0.1129, beats_loss=0.01204, ecapa_loss=0.0001696, whisper_loss=0.09918, over 23177.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01109, ecapa_loss=0.0001801, whisper_loss=0.09246, over 3866486.94 frames. ], batch size: 91, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:26:03,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1478760.0, ans=0.2 2024-08-12 05:26:18,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1478860.0, ans=0.0 2024-08-12 05:26:18,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1478860.0, ans=0.125 2024-08-12 05:26:25,703 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 31 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 05:26:48,793 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.658e+01 2.945e+01 3.393e+01 5.337e+01, threshold=5.890e+01, percent-clipped=0.0 2024-08-12 05:26:59,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1479260.0, ans=0.125 2024-08-12 05:27:00,058 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3000, loss[loss=0.1328, beats_loss=0.007131, ecapa_loss=0.0002569, whisper_loss=0.1231, over 15933.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01107, ecapa_loss=0.0001797, whisper_loss=0.09254, over 3874541.36 frames. ], batch size: 64, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:27:00,058 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 05:27:16,083 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.6291, 0.8927, 2.3750, 1.4256, 1.0512, 1.7704, 2.5366, 2.4800], device='cuda:2') 2024-08-12 05:27:41,670 INFO [train_multi_KD3.py:1149] (2/4) Epoch 11, validation on ASR_libri: loss=0.2561, beats_loss=0, ecapa_loss=0.0006006, whisper_loss=0.2501, over 922467.00 frames. 2024-08-12 05:27:58,657 INFO [train_multi_KD3.py:1149] (2/4) Epoch 11, validation on SV_voxceleb1: loss=0.004832, beats_loss=0, ecapa_loss=0.0004832, whisper_loss=0, over 939242.00 frames. 2024-08-12 05:30:00,055 INFO [train_multi_KD3.py:1149] (2/4) Epoch 11, validation on AT_audioset: loss=0.02445, beats_loss=0.02445, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 05:30:00,059 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 05:30:00,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1479260.0, ans=0.05 2024-08-12 05:30:01,459 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 26 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 05:30:25,979 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.48 vs. limit=22.5 2024-08-12 05:30:27,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1479460.0, ans=0.0 2024-08-12 05:30:27,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1479460.0, ans=0.2 2024-08-12 05:30:41,745 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-08-12 05:30:50,325 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 05:31:04,601 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3050, loss[loss=0.1038, beats_loss=0.0103, ecapa_loss=0.0001749, whisper_loss=0.09175, over 17842.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.011, ecapa_loss=0.0001805, whisper_loss=0.09389, over 3911726.81 frames. ], batch size: 66, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:31:08,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1479760.0, ans=0.1 2024-08-12 05:31:22,738 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-12 05:31:25,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1479860.0, ans=0.1 2024-08-12 05:31:25,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1479860.0, ans=0.125 2024-08-12 05:31:28,204 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 05:31:29,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1479960.0, ans=0.125 2024-08-12 05:31:29,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1479960.0, ans=0.2 2024-08-12 05:31:40,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1479960.0, ans=0.125 2024-08-12 05:31:45,996 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-08-12 05:31:53,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1480060.0, ans=0.1 2024-08-12 05:32:00,993 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.536e+01 2.925e+01 3.464e+01 9.985e+01, threshold=5.850e+01, percent-clipped=2.0 2024-08-12 05:32:10,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1480160.0, ans=0.125 2024-08-12 05:32:12,245 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3100, loss[loss=0.1146, beats_loss=0.01244, ecapa_loss=0.0001745, whisper_loss=0.1004, over 18043.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01092, ecapa_loss=0.0001824, whisper_loss=0.09428, over 3896148.34 frames. ], batch size: 73, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:32:13,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1480260.0, ans=0.0 2024-08-12 05:32:19,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1480260.0, ans=0.0 2024-08-12 05:32:24,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1480360.0, ans=0.125 2024-08-12 05:32:37,589 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 05:32:40,607 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2024-08-12 05:32:47,382 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 12 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 05:32:53,039 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 05:33:07,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1480660.0, ans=0.025 2024-08-12 05:33:08,236 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 05:33:09,658 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 05:33:13,748 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 05:33:17,573 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3150, loss[loss=0.1274, beats_loss=0.009816, ecapa_loss=0.0001971, whisper_loss=0.1156, over 23169.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01103, ecapa_loss=0.0001824, whisper_loss=0.09329, over 3887189.07 frames. ], batch size: 92, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:33:26,522 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-12 05:33:29,822 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.82 vs. limit=22.5 2024-08-12 05:33:46,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1480960.0, ans=0.125 2024-08-12 05:33:48,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1480960.0, ans=0.0 2024-08-12 05:33:49,982 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 05:34:10,369 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.183e+01 2.633e+01 2.990e+01 3.410e+01 4.926e+01, threshold=5.980e+01, percent-clipped=0.0 2024-08-12 05:34:17,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1481160.0, ans=0.125 2024-08-12 05:34:22,409 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3200, loss[loss=0.09522, beats_loss=0.01231, ecapa_loss=0.0001673, whisper_loss=0.08124, over 21588.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01105, ecapa_loss=0.0001813, whisper_loss=0.09305, over 3864135.62 frames. ], batch size: 88, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:34:25,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1481260.0, ans=0.125 2024-08-12 05:34:38,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1481360.0, ans=0.0 2024-08-12 05:34:42,712 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.07 vs. limit=10.0 2024-08-12 05:35:05,811 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-12 05:35:08,234 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 05:35:23,706 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 05:35:27,333 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3250, loss[loss=0.09493, beats_loss=0.01329, ecapa_loss=0.0001775, whisper_loss=0.07987, over 18188.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01096, ecapa_loss=0.0001818, whisper_loss=0.09398, over 3884484.20 frames. ], batch size: 75, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:35:33,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1481760.0, ans=0.1 2024-08-12 05:35:34,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1481760.0, ans=0.1 2024-08-12 05:35:39,595 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 05:35:44,515 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-12 05:35:52,377 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 05:35:59,176 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-12 05:36:11,337 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.82 vs. limit=15.0 2024-08-12 05:36:14,209 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.38 vs. limit=22.5 2024-08-12 05:36:18,811 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 05:36:21,191 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.550e+01 2.874e+01 3.283e+01 4.994e+01, threshold=5.748e+01, percent-clipped=0.0 2024-08-12 05:36:25,817 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-12 05:36:32,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1482260.0, ans=0.125 2024-08-12 05:36:33,123 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3300, loss[loss=0.117, beats_loss=0.01155, ecapa_loss=0.0001738, whisper_loss=0.1037, over 13965.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01095, ecapa_loss=0.000183, whisper_loss=0.09389, over 3891706.24 frames. ], batch size: 55, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:36:39,329 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.66 vs. limit=22.5 2024-08-12 05:36:41,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1482260.0, ans=0.0 2024-08-12 05:36:46,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1482360.0, ans=0.125 2024-08-12 05:36:54,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1482360.0, ans=0.125 2024-08-12 05:37:25,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1482660.0, ans=0.2 2024-08-12 05:37:29,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1482660.0, ans=0.125 2024-08-12 05:37:35,787 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.09 vs. limit=10.0 2024-08-12 05:37:37,762 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3350, loss[loss=0.106, beats_loss=0.007407, ecapa_loss=0.0002148, whisper_loss=0.09642, over 15165.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01095, ecapa_loss=0.0001812, whisper_loss=0.09347, over 3894719.32 frames. ], batch size: 60, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:37:48,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1482760.0, ans=0.0 2024-08-12 05:37:55,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1482860.0, ans=0.125 2024-08-12 05:38:16,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1483060.0, ans=0.2 2024-08-12 05:38:22,052 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.41 vs. limit=22.5 2024-08-12 05:38:27,061 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.37 vs. limit=22.5 2024-08-12 05:38:29,313 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-12 05:38:30,418 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.536e+01 3.034e+01 3.396e+01 1.773e+02, threshold=6.068e+01, percent-clipped=2.0 2024-08-12 05:38:36,069 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 05:38:36,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1483160.0, ans=0.1 2024-08-12 05:38:36,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1483160.0, ans=0.09899494936611666 2024-08-12 05:38:37,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1483160.0, ans=0.2 2024-08-12 05:38:39,878 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 05:38:42,251 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3400, loss[loss=0.1021, beats_loss=0.01074, ecapa_loss=0.0001736, whisper_loss=0.08967, over 17368.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01102, ecapa_loss=0.0001807, whisper_loss=0.09324, over 3904580.35 frames. ], batch size: 70, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:38:54,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1483360.0, ans=0.0 2024-08-12 05:38:56,348 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.08 vs. limit=22.5 2024-08-12 05:39:11,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1483460.0, ans=0.125 2024-08-12 05:39:14,177 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 05:39:35,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=22.5 2024-08-12 05:39:37,531 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-12 05:39:38,790 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 05:39:43,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=15.0 2024-08-12 05:39:49,989 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3450, loss[loss=0.1095, beats_loss=0.01129, ecapa_loss=0.0002014, whisper_loss=0.09616, over 22255.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.011, ecapa_loss=0.0001812, whisper_loss=0.09356, over 3917220.14 frames. ], batch size: 90, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:39:52,936 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 05:39:56,833 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 05:39:58,403 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 05:39:59,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1483760.0, ans=10.0 2024-08-12 05:40:06,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1483860.0, ans=0.0 2024-08-12 05:40:10,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1483860.0, ans=22.5 2024-08-12 05:40:46,623 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.618e+01 3.064e+01 3.498e+01 5.812e+01, threshold=6.129e+01, percent-clipped=0.0 2024-08-12 05:40:52,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1484160.0, ans=0.125 2024-08-12 05:40:57,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1484160.0, ans=0.125 2024-08-12 05:40:59,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1484260.0, ans=0.025 2024-08-12 05:40:59,832 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3500, loss[loss=0.06911, beats_loss=0.01086, ecapa_loss=0.0002129, whisper_loss=0.05613, over 15031.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01104, ecapa_loss=0.0001814, whisper_loss=0.0925, over 3887870.08 frames. ], batch size: 65, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:41:06,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1484260.0, ans=0.125 2024-08-12 05:41:14,725 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 05:41:30,083 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-12 05:41:30,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1484460.0, ans=0.09899494936611666 2024-08-12 05:41:51,286 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 05:42:10,311 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3550, loss[loss=0.09996, beats_loss=0.01348, ecapa_loss=0.0001319, whisper_loss=0.08517, over 18837.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01112, ecapa_loss=0.0001802, whisper_loss=0.09196, over 3891657.06 frames. ], batch size: 73, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:42:11,899 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 05:42:15,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1484760.0, ans=0.1 2024-08-12 05:42:28,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1484860.0, ans=0.0 2024-08-12 05:43:02,224 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.04 vs. limit=15.0 2024-08-12 05:43:07,473 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 05:43:09,655 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.668e+01 2.975e+01 3.438e+01 5.088e+01, threshold=5.950e+01, percent-clipped=0.0 2024-08-12 05:43:22,606 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3600, loss[loss=0.1274, beats_loss=0.01075, ecapa_loss=0.0001953, whisper_loss=0.1147, over 23570.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01112, ecapa_loss=0.0001806, whisper_loss=0.0917, over 3871726.54 frames. ], batch size: 95, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:43:39,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1485360.0, ans=0.125 2024-08-12 05:43:39,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1485360.0, ans=0.0 2024-08-12 05:44:12,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1485560.0, ans=0.125 2024-08-12 05:44:33,709 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3650, loss[loss=0.1085, beats_loss=0.01159, ecapa_loss=0.0001807, whisper_loss=0.09507, over 22908.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0111, ecapa_loss=0.0001808, whisper_loss=0.09214, over 3875702.78 frames. ], batch size: 94, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:44:36,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1485760.0, ans=0.125 2024-08-12 05:44:36,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1485760.0, ans=0.125 2024-08-12 05:44:37,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1485760.0, ans=0.0 2024-08-12 05:44:48,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1485860.0, ans=0.0 2024-08-12 05:44:50,957 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 05:44:51,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1485860.0, ans=15.0 2024-08-12 05:44:52,426 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 05:45:13,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1485960.0, ans=0.125 2024-08-12 05:45:17,221 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-12 05:45:18,495 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 05:45:32,664 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.529e+01 2.870e+01 3.231e+01 5.224e+01, threshold=5.739e+01, percent-clipped=0.0 2024-08-12 05:45:35,768 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-12 05:45:45,718 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3700, loss[loss=0.09175, beats_loss=0.01184, ecapa_loss=0.000202, whisper_loss=0.07788, over 18856.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01108, ecapa_loss=0.0001822, whisper_loss=0.09202, over 3854234.38 frames. ], batch size: 79, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:45:50,028 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.16 vs. limit=10.0 2024-08-12 05:45:58,969 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 05:45:59,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1486360.0, ans=0.0 2024-08-12 05:46:10,986 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 37 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 05:46:25,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1486460.0, ans=0.125 2024-08-12 05:46:28,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1486560.0, ans=0.125 2024-08-12 05:46:30,847 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-12 05:46:41,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1486560.0, ans=0.125 2024-08-12 05:46:50,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1486660.0, ans=0.0 2024-08-12 05:46:51,481 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 05:46:53,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1486660.0, ans=0.125 2024-08-12 05:46:57,556 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3750, loss[loss=0.08531, beats_loss=0.01339, ecapa_loss=0.0001607, whisper_loss=0.07031, over 15083.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01121, ecapa_loss=0.0001801, whisper_loss=0.09099, over 3869020.10 frames. ], batch size: 62, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:47:07,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1486760.0, ans=0.05 2024-08-12 05:47:15,077 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 05:47:28,483 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 05:47:29,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1486960.0, ans=0.0 2024-08-12 05:47:30,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1486960.0, ans=0.0 2024-08-12 05:47:41,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1487060.0, ans=0.125 2024-08-12 05:47:55,534 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.578e+01 2.846e+01 3.197e+01 4.164e+01, threshold=5.692e+01, percent-clipped=0.0 2024-08-12 05:48:09,238 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3800, loss[loss=0.09569, beats_loss=0.01132, ecapa_loss=0.0001564, whisper_loss=0.0828, over 22123.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01123, ecapa_loss=0.0001819, whisper_loss=0.09033, over 3859181.80 frames. ], batch size: 86, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:48:15,680 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2024-08-12 05:48:16,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1487260.0, ans=0.2 2024-08-12 05:48:37,451 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-12 05:48:43,011 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2024-08-12 05:49:06,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1487660.0, ans=0.125 2024-08-12 05:49:22,743 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3850, loss[loss=0.1157, beats_loss=0.00992, ecapa_loss=0.0001694, whisper_loss=0.1041, over 22544.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01121, ecapa_loss=0.0001802, whisper_loss=0.09112, over 3870323.32 frames. ], batch size: 89, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:49:28,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1487760.0, ans=0.125 2024-08-12 05:49:43,590 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1487860.0, ans=0.125 2024-08-12 05:49:46,114 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 05:49:46,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1487860.0, ans=0.0 2024-08-12 05:49:57,699 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-12 05:49:59,514 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2024-08-12 05:50:08,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1488060.0, ans=0.125 2024-08-12 05:50:08,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1488060.0, ans=0.125 2024-08-12 05:50:15,616 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 17 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 05:50:17,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1488060.0, ans=0.07 2024-08-12 05:50:22,458 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.543e+01 2.911e+01 3.298e+01 4.140e+01, threshold=5.821e+01, percent-clipped=0.0 2024-08-12 05:50:26,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1488160.0, ans=0.0 2024-08-12 05:50:33,675 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2024-08-12 05:50:35,764 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3900, loss[loss=0.05801, beats_loss=0.0142, ecapa_loss=0.0001405, whisper_loss=0.04241, over 13941.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01126, ecapa_loss=0.000181, whisper_loss=0.09084, over 3868607.78 frames. ], batch size: 56, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:50:45,428 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-12 05:50:54,562 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-12 05:51:05,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1488460.0, ans=0.0 2024-08-12 05:51:16,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1488460.0, ans=0.1 2024-08-12 05:51:17,612 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 23 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 05:51:20,716 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.689e+01 2024-08-12 05:51:34,002 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 05:51:50,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1488760.0, ans=0.0 2024-08-12 05:51:51,425 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 3950, loss[loss=0.09853, beats_loss=0.01128, ecapa_loss=0.0001977, whisper_loss=0.08527, over 15155.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0112, ecapa_loss=0.0001827, whisper_loss=0.09174, over 3908849.97 frames. ], batch size: 59, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:52:07,969 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 05:52:10,909 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 05:52:18,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.59 vs. limit=22.5 2024-08-12 05:52:22,807 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-12 05:52:45,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1489060.0, ans=0.125 2024-08-12 05:52:53,406 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.639e+01 2.878e+01 3.466e+01 7.368e+01, threshold=5.755e+01, percent-clipped=1.0 2024-08-12 05:53:04,550 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 05:53:07,599 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4000, loss[loss=0.1187, beats_loss=0.009126, ecapa_loss=0.0002259, whisper_loss=0.1073, over 14012.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01113, ecapa_loss=0.0001841, whisper_loss=0.09206, over 3895302.48 frames. ], batch size: 54, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:53:07,898 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 37 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 05:53:19,661 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 05:53:23,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1489360.0, ans=15.0 2024-08-12 05:53:34,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1489360.0, ans=0.125 2024-08-12 05:53:36,334 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 05:53:41,795 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 05:53:44,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1489460.0, ans=0.125 2024-08-12 05:53:55,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1489560.0, ans=0.125 2024-08-12 05:54:03,623 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 05:54:05,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1489560.0, ans=0.125 2024-08-12 05:54:10,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1489660.0, ans=22.5 2024-08-12 05:54:19,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1489660.0, ans=0.125 2024-08-12 05:54:23,076 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4050, loss[loss=0.08312, beats_loss=0.01434, ecapa_loss=0.0001866, whisper_loss=0.06691, over 21903.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01102, ecapa_loss=0.0001863, whisper_loss=0.09317, over 3898943.11 frames. ], batch size: 93, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:54:40,270 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 05:54:56,025 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2024-08-12 05:55:04,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1489960.0, ans=0.0 2024-08-12 05:55:25,111 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.625e+01 2.909e+01 3.364e+01 7.852e+01, threshold=5.817e+01, percent-clipped=2.0 2024-08-12 05:55:35,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1490160.0, ans=0.125 2024-08-12 05:55:39,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1490260.0, ans=0.2 2024-08-12 05:55:39,776 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4100, loss[loss=0.1159, beats_loss=0.01118, ecapa_loss=0.0001438, whisper_loss=0.1033, over 23481.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01098, ecapa_loss=0.0001858, whisper_loss=0.09304, over 3887600.50 frames. ], batch size: 92, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:55:50,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1490260.0, ans=0.1 2024-08-12 05:55:59,517 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 05:56:06,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1490360.0, ans=0.125 2024-08-12 05:56:41,397 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2024-08-12 05:56:56,095 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4150, loss[loss=0.1081, beats_loss=0.01137, ecapa_loss=0.0001824, whisper_loss=0.09494, over 20546.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01096, ecapa_loss=0.0001856, whisper_loss=0.09315, over 3856294.60 frames. ], batch size: 83, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:56:59,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1490760.0, ans=0.1 2024-08-12 05:57:25,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1490860.0, ans=0.125 2024-08-12 05:57:44,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1491060.0, ans=0.0 2024-08-12 05:57:47,232 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 05:57:49,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1491060.0, ans=0.035 2024-08-12 05:57:52,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1491060.0, ans=0.035 2024-08-12 05:58:01,423 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.619e+01 2.867e+01 3.217e+01 5.431e+01, threshold=5.734e+01, percent-clipped=0.0 2024-08-12 05:58:01,683 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 05:58:08,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1491160.0, ans=0.125 2024-08-12 05:58:15,594 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4200, loss[loss=0.1055, beats_loss=0.01078, ecapa_loss=0.0001566, whisper_loss=0.09318, over 23616.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01098, ecapa_loss=0.0001845, whisper_loss=0.09321, over 3882395.12 frames. ], batch size: 92, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:58:17,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1491260.0, ans=0.0 2024-08-12 05:58:20,544 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-12 05:58:43,362 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-12 05:58:44,940 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 05:58:49,334 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 05:58:54,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1491460.0, ans=0.125 2024-08-12 05:58:58,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1491460.0, ans=0.125 2024-08-12 05:59:02,973 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.65 vs. limit=22.5 2024-08-12 05:59:13,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1491560.0, ans=0.125 2024-08-12 05:59:14,725 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-12 05:59:21,340 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 05:59:32,526 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 05:59:34,817 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4250, loss[loss=0.09111, beats_loss=0.0133, ecapa_loss=0.0001575, whisper_loss=0.07623, over 23093.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01093, ecapa_loss=0.0001834, whisper_loss=0.09327, over 3917793.30 frames. ], batch size: 94, lr: 5.80e-03, grad_scale: 1.152921504606847e+18 2024-08-12 05:59:35,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1491760.0, ans=0.0 2024-08-12 05:59:45,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1491760.0, ans=0.125 2024-08-12 05:59:55,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1491860.0, ans=0.0 2024-08-12 06:00:09,718 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 06:00:13,488 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 10 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-12 06:00:22,927 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 06:00:23,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1492060.0, ans=0.125 2024-08-12 06:00:24,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1492060.0, ans=0.125 2024-08-12 06:00:40,481 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.448e+01 2.725e+01 3.062e+01 4.978e+01, threshold=5.450e+01, percent-clipped=0.0 2024-08-12 06:00:49,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1492160.0, ans=0.125 2024-08-12 06:00:56,064 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4300, loss[loss=0.1164, beats_loss=0.01142, ecapa_loss=0.0001578, whisper_loss=0.1034, over 21092.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01103, ecapa_loss=0.0001819, whisper_loss=0.09312, over 3915389.75 frames. ], batch size: 81, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:00:59,646 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.767e+01 2024-08-12 06:01:26,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1492460.0, ans=0.2 2024-08-12 06:01:36,328 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 06:01:36,975 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.11 vs. limit=10.0 2024-08-12 06:01:56,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1492560.0, ans=0.0 2024-08-12 06:02:07,664 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 06:02:16,107 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4350, loss[loss=0.08344, beats_loss=0.0149, ecapa_loss=0.0001485, whisper_loss=0.06706, over 21625.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01108, ecapa_loss=0.0001826, whisper_loss=0.09215, over 3879770.00 frames. ], batch size: 91, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:02:18,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1492760.0, ans=0.125 2024-08-12 06:02:34,834 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-12 06:02:57,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1492960.0, ans=0.1 2024-08-12 06:02:57,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-12 06:02:58,053 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.47 vs. limit=15.0 2024-08-12 06:03:13,310 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 06:03:16,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1493060.0, ans=0.0 2024-08-12 06:03:25,248 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.573e+01 2.985e+01 3.568e+01 9.873e+01, threshold=5.969e+01, percent-clipped=3.0 2024-08-12 06:03:36,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1493160.0, ans=0.125 2024-08-12 06:03:38,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1493160.0, ans=0.125 2024-08-12 06:03:40,616 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4400, loss[loss=0.107, beats_loss=0.01279, ecapa_loss=0.0001633, whisper_loss=0.09255, over 16487.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01107, ecapa_loss=0.0001827, whisper_loss=0.09281, over 3871373.40 frames. ], batch size: 62, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:03:52,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1493260.0, ans=0.5 2024-08-12 06:03:53,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1493260.0, ans=0.125 2024-08-12 06:04:33,012 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 06:04:46,521 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-12 06:04:46,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1493660.0, ans=0.1 2024-08-12 06:04:47,183 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=12.0 2024-08-12 06:05:05,058 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4450, loss[loss=0.08207, beats_loss=0.0147, ecapa_loss=0.0001119, whisper_loss=0.06625, over 22754.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01102, ecapa_loss=0.0001828, whisper_loss=0.09227, over 3863572.60 frames. ], batch size: 89, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:05:23,605 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 06:05:31,383 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.75 vs. limit=22.5 2024-08-12 06:05:34,300 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-12 06:05:35,933 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 06:05:58,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1494060.0, ans=0.125 2024-08-12 06:06:04,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1494060.0, ans=0.07 2024-08-12 06:06:06,919 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2024-08-12 06:06:07,298 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.01 vs. limit=12.0 2024-08-12 06:06:10,636 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-12 06:06:13,903 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.568e+01 2.752e+01 3.153e+01 4.560e+01, threshold=5.503e+01, percent-clipped=0.0 2024-08-12 06:06:24,424 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 06:06:29,680 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4500, loss[loss=0.09713, beats_loss=0.01185, ecapa_loss=0.0001806, whisper_loss=0.08348, over 22053.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01108, ecapa_loss=0.0001814, whisper_loss=0.09178, over 3888254.55 frames. ], batch size: 88, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:07:01,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1494360.0, ans=0.1 2024-08-12 06:07:04,121 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-12 06:07:41,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1494660.0, ans=0.1 2024-08-12 06:07:54,778 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.73 vs. limit=22.5 2024-08-12 06:07:55,731 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4550, loss[loss=0.0929, beats_loss=0.0128, ecapa_loss=0.0001911, whisper_loss=0.07819, over 23317.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01107, ecapa_loss=0.0001823, whisper_loss=0.09181, over 3903635.32 frames. ], batch size: 97, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:08:01,013 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=8.340e-01 2024-08-12 06:08:37,458 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-12 06:09:05,893 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.505e+01 2.717e+01 3.004e+01 5.094e+01, threshold=5.435e+01, percent-clipped=0.0 2024-08-12 06:09:07,943 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 21 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-12 06:09:18,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1495160.0, ans=0.125 2024-08-12 06:09:20,739 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4600, loss[loss=0.09542, beats_loss=0.009912, ecapa_loss=0.000145, whisper_loss=0.08406, over 16553.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01117, ecapa_loss=0.0001826, whisper_loss=0.09073, over 3926421.57 frames. ], batch size: 61, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:09:21,600 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.42 vs. limit=22.5 2024-08-12 06:09:26,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1495260.0, ans=0.2 2024-08-12 06:09:32,685 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 06:09:34,648 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 06:09:53,260 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 06:10:09,970 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-12 06:10:10,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1495560.0, ans=0.0 2024-08-12 06:10:16,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1495560.0, ans=0.0 2024-08-12 06:10:20,675 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2024-08-12 06:10:30,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1495660.0, ans=0.2 2024-08-12 06:10:39,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1495660.0, ans=0.125 2024-08-12 06:10:44,560 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4650, loss[loss=0.1047, beats_loss=0.0113, ecapa_loss=0.0001412, whisper_loss=0.09201, over 13474.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01114, ecapa_loss=0.0001836, whisper_loss=0.09132, over 3924335.39 frames. ], batch size: 53, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:10:54,657 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.70 vs. limit=15.0 2024-08-12 06:11:01,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=1495860.0, ans=15.0 2024-08-12 06:11:05,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1495860.0, ans=0.125 2024-08-12 06:11:37,959 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 06:11:54,242 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.541e+01 2.726e+01 3.242e+01 5.233e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-12 06:12:09,143 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4700, loss[loss=0.1246, beats_loss=0.007916, ecapa_loss=0.0002297, whisper_loss=0.1144, over 17678.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01108, ecapa_loss=0.0001836, whisper_loss=0.09236, over 3913715.08 frames. ], batch size: 73, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:12:16,946 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-12 06:12:17,864 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.55 vs. limit=15.0 2024-08-12 06:12:24,830 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 17 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-12 06:12:33,598 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2024-08-12 06:12:37,850 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 06:13:19,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1496660.0, ans=0.125 2024-08-12 06:13:30,810 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4750, loss[loss=0.1183, beats_loss=0.009743, ecapa_loss=0.0002175, whisper_loss=0.1064, over 18256.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01112, ecapa_loss=0.0001814, whisper_loss=0.09187, over 3923829.99 frames. ], batch size: 76, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:13:32,329 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 06:14:11,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1496960.0, ans=0.125 2024-08-12 06:14:29,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1497060.0, ans=0.0 2024-08-12 06:14:30,102 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.52 vs. limit=15.0 2024-08-12 06:14:36,392 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.611e+01 2.918e+01 3.267e+01 6.538e+01, threshold=5.836e+01, percent-clipped=2.0 2024-08-12 06:14:40,852 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2024-08-12 06:14:50,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1497260.0, ans=0.125 2024-08-12 06:14:51,068 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4800, loss[loss=0.09708, beats_loss=0.01245, ecapa_loss=0.0001629, whisper_loss=0.08301, over 18965.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01124, ecapa_loss=0.0001807, whisper_loss=0.09135, over 3924527.96 frames. ], batch size: 74, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:14:53,029 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 20 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-12 06:14:53,587 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2024-08-12 06:14:57,524 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-12 06:15:01,161 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 06:15:04,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1497260.0, ans=0.1 2024-08-12 06:15:14,355 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-12 06:15:17,553 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 06:15:19,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1497360.0, ans=0.1 2024-08-12 06:15:35,257 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 06:15:44,192 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.58 vs. limit=12.0 2024-08-12 06:15:45,843 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-08-12 06:15:54,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1497560.0, ans=0.125 2024-08-12 06:16:04,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1497660.0, ans=0.125 2024-08-12 06:16:13,548 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4850, loss[loss=0.1185, beats_loss=0.01112, ecapa_loss=0.0001492, whisper_loss=0.1058, over 17968.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01123, ecapa_loss=0.0001804, whisper_loss=0.09172, over 3910446.21 frames. ], batch size: 67, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:16:31,865 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-12 06:16:33,511 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 06:16:43,884 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.73 vs. limit=15.0 2024-08-12 06:16:45,719 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.66 vs. limit=12.0 2024-08-12 06:16:46,207 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 17 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-12 06:17:03,780 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 21 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 06:17:19,448 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.592e+01 2.912e+01 3.180e+01 4.291e+01, threshold=5.823e+01, percent-clipped=0.0 2024-08-12 06:17:34,497 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4900, loss[loss=0.1184, beats_loss=0.007851, ecapa_loss=0.000204, whisper_loss=0.1086, over 17187.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01121, ecapa_loss=0.0001812, whisper_loss=0.09142, over 3892757.03 frames. ], batch size: 68, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:17:41,811 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 06:17:57,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1498360.0, ans=0.0 2024-08-12 06:18:11,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1498460.0, ans=0.05 2024-08-12 06:18:13,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1498460.0, ans=0.125 2024-08-12 06:18:17,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1498460.0, ans=0.0 2024-08-12 06:18:21,389 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 06:18:27,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2024-08-12 06:18:30,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.90 vs. limit=10.0 2024-08-12 06:18:34,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1498560.0, ans=0.0 2024-08-12 06:18:38,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1498560.0, ans=0.125 2024-08-12 06:18:57,152 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 4950, loss[loss=0.1077, beats_loss=0.01134, ecapa_loss=0.000178, whisper_loss=0.09453, over 22809.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01122, ecapa_loss=0.0001816, whisper_loss=0.09119, over 3913611.81 frames. ], batch size: 89, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:19:02,935 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 06:19:07,515 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 06:19:24,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1498860.0, ans=0.0 2024-08-12 06:20:01,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1499160.0, ans=0.1 2024-08-12 06:20:04,799 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.693e+01 3.094e+01 3.524e+01 6.311e+01, threshold=6.188e+01, percent-clipped=2.0 2024-08-12 06:20:12,388 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.79 vs. limit=6.0 2024-08-12 06:20:15,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1499160.0, ans=0.125 2024-08-12 06:20:19,291 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=15.0 2024-08-12 06:20:19,928 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5000, loss[loss=0.0787, beats_loss=0.01141, ecapa_loss=0.0002093, whisper_loss=0.06519, over 16313.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01114, ecapa_loss=0.0001833, whisper_loss=0.09129, over 3897051.79 frames. ], batch size: 70, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:20:34,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1499360.0, ans=0.125 2024-08-12 06:20:42,620 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-12 06:20:47,262 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-12 06:20:47,744 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=15.0 2024-08-12 06:20:54,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1499460.0, ans=0.125 2024-08-12 06:21:01,876 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 39 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 06:21:24,214 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.42 vs. limit=15.0 2024-08-12 06:21:25,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1499660.0, ans=0.125 2024-08-12 06:21:32,874 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 9 from Vox, 38 fro AS 2024-08-12 06:21:41,748 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5050, loss[loss=0.1037, beats_loss=0.009175, ecapa_loss=0.0002149, whisper_loss=0.09242, over 18534.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01118, ecapa_loss=0.0001838, whisper_loss=0.0918, over 3901476.69 frames. ], batch size: 77, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:21:58,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1499860.0, ans=0.0 2024-08-12 06:22:10,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1499860.0, ans=0.125 2024-08-12 06:22:18,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1499960.0, ans=0.0 2024-08-12 06:22:51,070 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.582e+01 2.858e+01 3.371e+01 2.461e+02, threshold=5.717e+01, percent-clipped=1.0 2024-08-12 06:22:51,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1500160.0, ans=0.1 2024-08-12 06:23:03,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1500160.0, ans=0.0 2024-08-12 06:23:05,938 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5100, loss[loss=0.09771, beats_loss=0.01473, ecapa_loss=0.0001647, whisper_loss=0.08133, over 19490.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01118, ecapa_loss=0.0001829, whisper_loss=0.09195, over 3884372.80 frames. ], batch size: 79, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:23:06,129 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 33 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 06:23:35,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1500360.0, ans=0.0 2024-08-12 06:23:36,899 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.99 vs. limit=15.0 2024-08-12 06:23:43,976 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.32 vs. limit=15.0 2024-08-12 06:23:47,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1500460.0, ans=0.125 2024-08-12 06:23:51,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1500460.0, ans=0.95 2024-08-12 06:24:02,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1500560.0, ans=0.125 2024-08-12 06:24:09,104 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-12 06:24:18,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1500660.0, ans=0.2 2024-08-12 06:24:19,721 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 06:24:27,317 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5150, loss[loss=0.09166, beats_loss=0.01111, ecapa_loss=0.0001545, whisper_loss=0.079, over 17270.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01119, ecapa_loss=0.0001808, whisper_loss=0.09236, over 3885726.55 frames. ], batch size: 66, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:24:31,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1500760.0, ans=10.0 2024-08-12 06:24:32,101 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.15 vs. limit=10.0 2024-08-12 06:24:32,981 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 06:24:56,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1500860.0, ans=0.09899494936611666 2024-08-12 06:25:01,992 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2024-08-12 06:25:22,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1501060.0, ans=0.125 2024-08-12 06:25:23,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1501060.0, ans=10.0 2024-08-12 06:26:03,299 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.539e+01 2.805e+01 3.216e+01 1.904e+02, threshold=5.610e+01, percent-clipped=1.0 2024-08-12 06:26:04,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1501160.0, ans=0.5 2024-08-12 06:26:12,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1501160.0, ans=0.125 2024-08-12 06:26:21,170 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5200, loss[loss=0.0846, beats_loss=0.0135, ecapa_loss=0.00017, whisper_loss=0.0694, over 14249.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01107, ecapa_loss=0.0001805, whisper_loss=0.09294, over 3897570.90 frames. ], batch size: 60, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:26:46,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1501360.0, ans=0.0 2024-08-12 06:27:19,086 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2024-08-12 06:27:49,679 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5250, loss[loss=0.1189, beats_loss=0.01051, ecapa_loss=0.0001647, whisper_loss=0.1068, over 14599.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01108, ecapa_loss=0.0001801, whisper_loss=0.09249, over 3891595.37 frames. ], batch size: 58, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:27:57,173 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 17 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 06:27:57,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1501760.0, ans=0.5 2024-08-12 06:27:59,116 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 06:28:08,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1501860.0, ans=0.125 2024-08-12 06:28:47,836 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 06:28:49,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1502060.0, ans=0.09899494936611666 2024-08-12 06:28:58,843 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.556e+01 2.811e+01 3.138e+01 9.826e+01, threshold=5.623e+01, percent-clipped=1.0 2024-08-12 06:29:13,729 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5300, loss[loss=0.121, beats_loss=0.008804, ecapa_loss=0.0001938, whisper_loss=0.1103, over 21762.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01106, ecapa_loss=0.0001795, whisper_loss=0.09225, over 3900113.38 frames. ], batch size: 85, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:29:49,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1502460.0, ans=0.025 2024-08-12 06:30:07,384 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 26 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 06:30:14,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1502560.0, ans=0.2 2024-08-12 06:30:28,197 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.443e-01 2024-08-12 06:30:29,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1502660.0, ans=0.2 2024-08-12 06:30:33,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1502660.0, ans=0.1 2024-08-12 06:30:36,256 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5350, loss[loss=0.08234, beats_loss=0.01216, ecapa_loss=0.0001339, whisper_loss=0.06885, over 16056.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01107, ecapa_loss=0.0001773, whisper_loss=0.09188, over 3886377.02 frames. ], batch size: 61, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:30:40,131 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.69 vs. limit=6.0 2024-08-12 06:30:48,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1502760.0, ans=0.1 2024-08-12 06:30:54,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1502860.0, ans=0.125 2024-08-12 06:30:58,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1502860.0, ans=10.0 2024-08-12 06:31:22,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1502960.0, ans=0.0 2024-08-12 06:31:27,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1503060.0, ans=0.0 2024-08-12 06:31:32,866 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2024-08-12 06:31:35,476 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-12 06:31:35,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1503060.0, ans=0.125 2024-08-12 06:31:43,492 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.466e+01 2.824e+01 3.264e+01 5.204e+01, threshold=5.648e+01, percent-clipped=0.0 2024-08-12 06:31:48,257 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-12 06:31:57,464 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5400, loss[loss=0.1237, beats_loss=0.01048, ecapa_loss=0.0001619, whisper_loss=0.1116, over 16667.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01108, ecapa_loss=0.0001771, whisper_loss=0.0919, over 3874942.44 frames. ], batch size: 64, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:32:03,061 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.43 vs. limit=22.5 2024-08-12 06:32:23,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1503360.0, ans=0.2 2024-08-12 06:32:31,348 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 24 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-12 06:32:31,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1503460.0, ans=0.0 2024-08-12 06:32:42,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=1503460.0, ans=0.5 2024-08-12 06:32:44,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1503560.0, ans=0.025 2024-08-12 06:32:49,111 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-12 06:33:13,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1503660.0, ans=0.1 2024-08-12 06:33:14,133 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2024-08-12 06:33:17,802 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5450, loss[loss=0.1187, beats_loss=0.009412, ecapa_loss=0.0001748, whisper_loss=0.1076, over 14322.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0111, ecapa_loss=0.0001773, whisper_loss=0.09269, over 3869069.61 frames. ], batch size: 55, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:33:30,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1503760.0, ans=0.0 2024-08-12 06:33:32,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=1503860.0, ans=0.1 2024-08-12 06:33:35,640 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 06:33:39,019 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-12 06:33:44,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1503860.0, ans=0.1 2024-08-12 06:33:45,335 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-12 06:33:59,966 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.32 vs. limit=15.0 2024-08-12 06:34:07,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1504060.0, ans=0.125 2024-08-12 06:34:10,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1504060.0, ans=0.0 2024-08-12 06:34:12,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1504060.0, ans=0.0 2024-08-12 06:34:20,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1504160.0, ans=0.05 2024-08-12 06:34:23,023 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.572e+01 2.892e+01 3.418e+01 4.149e+01, threshold=5.785e+01, percent-clipped=0.0 2024-08-12 06:34:34,816 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 06:34:35,056 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2024-08-12 06:34:37,116 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5500, loss[loss=0.09703, beats_loss=0.009694, ecapa_loss=0.0002307, whisper_loss=0.08503, over 15207.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01111, ecapa_loss=0.0001777, whisper_loss=0.09236, over 3860049.26 frames. ], batch size: 63, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:34:48,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1504260.0, ans=0.2 2024-08-12 06:35:12,574 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-12 06:35:17,341 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 06:35:42,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1504660.0, ans=0.1 2024-08-12 06:35:44,597 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2024-08-12 06:35:49,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1504660.0, ans=0.0 2024-08-12 06:35:56,460 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5550, loss[loss=0.1038, beats_loss=0.007916, ecapa_loss=0.0002293, whisper_loss=0.09358, over 13973.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01113, ecapa_loss=0.0001793, whisper_loss=0.0919, over 3881720.88 frames. ], batch size: 56, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:36:25,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1504860.0, ans=0.2 2024-08-12 06:36:29,612 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 15 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-12 06:36:55,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1505060.0, ans=0.2 2024-08-12 06:36:58,794 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-12 06:37:01,307 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.511e+01 2.832e+01 3.131e+01 5.675e+01, threshold=5.663e+01, percent-clipped=0.0 2024-08-12 06:37:05,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1505160.0, ans=0.0 2024-08-12 06:37:11,332 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 06:37:15,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1505260.0, ans=0.04949747468305833 2024-08-12 06:37:16,267 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5600, loss[loss=0.08808, beats_loss=0.01076, ecapa_loss=0.0001832, whisper_loss=0.07548, over 13984.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01108, ecapa_loss=0.0001807, whisper_loss=0.09227, over 3890700.87 frames. ], batch size: 55, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:37:37,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1505360.0, ans=0.1 2024-08-12 06:37:47,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1505360.0, ans=0.1 2024-08-12 06:37:49,795 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 06:37:50,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1505460.0, ans=0.05 2024-08-12 06:38:02,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1505460.0, ans=0.125 2024-08-12 06:38:26,812 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 06:38:30,638 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.83 vs. limit=15.0 2024-08-12 06:38:39,750 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5650, loss[loss=0.1103, beats_loss=0.009579, ecapa_loss=0.0001498, whisper_loss=0.09918, over 16640.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01108, ecapa_loss=0.0001817, whisper_loss=0.09171, over 3867081.91 frames. ], batch size: 64, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:38:54,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1505860.0, ans=0.125 2024-08-12 06:39:12,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1505960.0, ans=0.125 2024-08-12 06:39:19,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1505960.0, ans=0.1 2024-08-12 06:39:22,691 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2024-08-12 06:39:34,270 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 34 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-12 06:39:34,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1506060.0, ans=0.1 2024-08-12 06:39:38,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1506060.0, ans=0.95 2024-08-12 06:39:44,870 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.537e+01 2.761e+01 3.260e+01 5.240e+01, threshold=5.523e+01, percent-clipped=0.0 2024-08-12 06:39:48,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1506160.0, ans=0.0 2024-08-12 06:39:50,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1506160.0, ans=0.125 2024-08-12 06:39:58,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1506260.0, ans=0.0 2024-08-12 06:39:59,383 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5700, loss[loss=0.1114, beats_loss=0.01263, ecapa_loss=0.0001615, whisper_loss=0.0972, over 15342.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01113, ecapa_loss=0.0001808, whisper_loss=0.09172, over 3880358.83 frames. ], batch size: 61, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:40:07,623 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-12 06:40:09,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1506260.0, ans=0.2 2024-08-12 06:40:17,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.24 vs. limit=22.5 2024-08-12 06:40:30,159 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 44 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 06:40:34,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1506460.0, ans=0.125 2024-08-12 06:41:06,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1506660.0, ans=0.2 2024-08-12 06:41:20,892 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 06:41:21,955 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5750, loss[loss=0.1106, beats_loss=0.01269, ecapa_loss=0.000161, whisper_loss=0.0963, over 19376.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01105, ecapa_loss=0.0001803, whisper_loss=0.09206, over 3857437.25 frames. ], batch size: 77, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:41:27,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=1506760.0, ans=0.1 2024-08-12 06:41:43,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1506860.0, ans=0.0 2024-08-12 06:41:44,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1506860.0, ans=0.0 2024-08-12 06:41:52,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1506960.0, ans=0.0 2024-08-12 06:42:13,276 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 06:42:14,474 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-12 06:42:21,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1507060.0, ans=0.2 2024-08-12 06:42:22,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1507060.0, ans=0.125 2024-08-12 06:42:27,036 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.547e+01 2.860e+01 3.182e+01 5.592e+01, threshold=5.721e+01, percent-clipped=1.0 2024-08-12 06:42:37,867 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=17.88 vs. limit=15.0 2024-08-12 06:42:39,155 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 06:42:39,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1507160.0, ans=0.125 2024-08-12 06:42:39,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1507160.0, ans=0.125 2024-08-12 06:42:41,642 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5800, loss[loss=0.09425, beats_loss=0.01039, ecapa_loss=0.0002141, whisper_loss=0.08172, over 21843.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01104, ecapa_loss=0.0001805, whisper_loss=0.09192, over 3846432.04 frames. ], batch size: 94, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:43:18,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1507460.0, ans=0.1 2024-08-12 06:43:37,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1507560.0, ans=0.125 2024-08-12 06:43:48,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1507660.0, ans=0.125 2024-08-12 06:43:50,781 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.69 vs. limit=22.5 2024-08-12 06:43:54,021 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.15 vs. limit=22.5 2024-08-12 06:43:55,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1507760.0, ans=0.125 2024-08-12 06:43:55,964 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5850, loss[loss=0.09353, beats_loss=0.01317, ecapa_loss=0.0001607, whisper_loss=0.07876, over 21901.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01107, ecapa_loss=0.0001808, whisper_loss=0.09173, over 3864914.37 frames. ], batch size: 89, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:43:56,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1507760.0, ans=0.0 2024-08-12 06:44:04,471 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 06:44:26,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1507960.0, ans=0.1 2024-08-12 06:44:31,635 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-12 06:44:47,701 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 06:44:48,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=1508060.0, ans=12.0 2024-08-12 06:44:55,700 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.480e+01 2.777e+01 3.215e+01 5.489e+01, threshold=5.554e+01, percent-clipped=0.0 2024-08-12 06:44:57,407 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 29 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-12 06:45:06,978 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5900, loss[loss=0.1071, beats_loss=0.01167, ecapa_loss=0.0001867, whisper_loss=0.09355, over 22670.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01119, ecapa_loss=0.0001811, whisper_loss=0.09128, over 3881263.71 frames. ], batch size: 93, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:45:10,174 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 26 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-12 06:45:15,270 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-08-12 06:45:20,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1508360.0, ans=0.125 2024-08-12 06:45:23,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1508360.0, ans=0.2 2024-08-12 06:45:35,530 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 06:45:38,081 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-12 06:45:39,441 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 06:45:48,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1508560.0, ans=0.125 2024-08-12 06:45:52,009 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 06:45:54,786 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 27 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-12 06:45:56,149 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 06:45:58,748 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2024-08-12 06:46:00,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1508560.0, ans=0.125 2024-08-12 06:46:06,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1508660.0, ans=0.125 2024-08-12 06:46:11,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1508660.0, ans=0.0 2024-08-12 06:46:13,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1508660.0, ans=0.125 2024-08-12 06:46:16,886 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 5950, loss[loss=0.1361, beats_loss=0.007789, ecapa_loss=0.0002261, whisper_loss=0.1261, over 23721.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0112, ecapa_loss=0.0001806, whisper_loss=0.0913, over 3883534.81 frames. ], batch size: 94, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:46:18,787 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 06:46:18,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1508760.0, ans=0.125 2024-08-12 06:46:28,927 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.41 vs. limit=22.5 2024-08-12 06:46:34,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1508860.0, ans=0.125 2024-08-12 06:46:38,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1508860.0, ans=0.2 2024-08-12 06:46:47,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1508960.0, ans=0.0 2024-08-12 06:46:49,077 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 06:47:04,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1509060.0, ans=0.1 2024-08-12 06:47:12,783 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 06:47:14,794 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.32 vs. limit=10.0 2024-08-12 06:47:15,170 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.604e+01 2.882e+01 3.325e+01 5.467e+01, threshold=5.764e+01, percent-clipped=0.0 2024-08-12 06:47:26,645 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6000, loss[loss=0.09793, beats_loss=0.01128, ecapa_loss=0.0001809, whisper_loss=0.08484, over 18948.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01115, ecapa_loss=0.00018, whisper_loss=0.09184, over 3908110.30 frames. ], batch size: 74, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:47:26,646 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 06:48:09,255 INFO [train_multi_KD3.py:1149] (2/4) Epoch 11, validation on ASR_libri: loss=0.2544, beats_loss=0, ecapa_loss=0.000598, whisper_loss=0.2484, over 922467.00 frames. 2024-08-12 06:48:27,601 INFO [train_multi_KD3.py:1149] (2/4) Epoch 11, validation on SV_voxceleb1: loss=0.004893, beats_loss=0, ecapa_loss=0.0004893, whisper_loss=0, over 939242.00 frames. 2024-08-12 06:49:03,605 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.5623, 2.5345, 2.8143, 2.5591], device='cuda:2') 2024-08-12 06:50:30,783 INFO [train_multi_KD3.py:1149] (2/4) Epoch 11, validation on AT_audioset: loss=0.02461, beats_loss=0.02461, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 06:50:30,793 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 06:50:33,676 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 06:50:41,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1509260.0, ans=0.95 2024-08-12 06:50:49,729 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 06:50:56,746 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 06:51:01,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1509460.0, ans=0.0 2024-08-12 06:51:04,328 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 06:51:05,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1509460.0, ans=0.125 2024-08-12 06:51:05,283 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.15 vs. limit=22.5 2024-08-12 06:51:08,700 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 6 from Vox, 33 fro AS 2024-08-12 06:51:32,367 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 06:51:41,933 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6050, loss[loss=0.09346, beats_loss=0.009587, ecapa_loss=0.0002121, whisper_loss=0.08175, over 15124.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01109, ecapa_loss=0.0001791, whisper_loss=0.09241, over 3900356.59 frames. ], batch size: 63, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:51:54,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1509860.0, ans=0.0 2024-08-12 06:52:05,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1509860.0, ans=0.2 2024-08-12 06:52:11,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1509960.0, ans=0.2 2024-08-12 06:52:15,202 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 06:52:39,882 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.536e+01 2.768e+01 3.094e+01 4.494e+01, threshold=5.536e+01, percent-clipped=0.0 2024-08-12 06:52:48,291 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 06:52:50,878 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6100, loss[loss=0.1114, beats_loss=0.01139, ecapa_loss=0.0001816, whisper_loss=0.09818, over 16064.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01118, ecapa_loss=0.0001808, whisper_loss=0.09242, over 3912210.59 frames. ], batch size: 64, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:52:54,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1510260.0, ans=0.125 2024-08-12 06:53:01,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1510260.0, ans=0.125 2024-08-12 06:53:02,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1510260.0, ans=0.2 2024-08-12 06:53:04,642 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 16 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-12 06:53:19,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1510460.0, ans=0.0 2024-08-12 06:53:36,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1510560.0, ans=0.125 2024-08-12 06:53:39,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1510560.0, ans=0.125 2024-08-12 06:53:42,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1510560.0, ans=0.0 2024-08-12 06:53:51,728 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.25 vs. limit=10.0 2024-08-12 06:53:52,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1510660.0, ans=0.2 2024-08-12 06:54:00,080 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6150, loss[loss=0.1015, beats_loss=0.01088, ecapa_loss=0.000154, whisper_loss=0.0891, over 15364.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01112, ecapa_loss=0.000181, whisper_loss=0.09248, over 3898662.58 frames. ], batch size: 60, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:54:00,325 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 28 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-12 06:54:08,306 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 06:54:12,696 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 06:54:25,915 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-12 06:54:50,645 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 06:54:55,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1511160.0, ans=0.125 2024-08-12 06:54:57,346 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.648e+01 2.944e+01 3.398e+01 5.258e+01, threshold=5.887e+01, percent-clipped=0.0 2024-08-12 06:54:57,709 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 06:55:08,368 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6200, loss[loss=0.1187, beats_loss=0.009536, ecapa_loss=0.0001604, whisper_loss=0.1076, over 24047.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01109, ecapa_loss=0.0001802, whisper_loss=0.09252, over 3891929.32 frames. ], batch size: 90, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:55:30,492 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-12 06:55:36,192 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 06:55:37,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1511460.0, ans=0.125 2024-08-12 06:55:38,055 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=12.0 2024-08-12 06:55:40,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1511460.0, ans=0.0 2024-08-12 06:55:43,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1511460.0, ans=0.1 2024-08-12 06:55:44,784 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.31 vs. limit=10.0 2024-08-12 06:55:45,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1511460.0, ans=0.125 2024-08-12 06:55:46,160 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2024-08-12 06:55:47,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1511460.0, ans=0.0 2024-08-12 06:56:01,427 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=22.5 2024-08-12 06:56:02,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1511660.0, ans=0.0 2024-08-12 06:56:15,760 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 06:56:17,095 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6250, loss[loss=0.09238, beats_loss=0.01344, ecapa_loss=0.0001596, whisper_loss=0.07734, over 14767.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01109, ecapa_loss=0.0001807, whisper_loss=0.09314, over 3904902.28 frames. ], batch size: 57, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:56:17,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1511760.0, ans=0.125 2024-08-12 06:56:20,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1511760.0, ans=0.1 2024-08-12 06:56:32,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1511860.0, ans=0.07 2024-08-12 06:56:55,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1511960.0, ans=0.125 2024-08-12 06:57:04,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1512060.0, ans=0.1 2024-08-12 06:57:14,366 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 06:57:14,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1512160.0, ans=0.035 2024-08-12 06:57:16,752 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.481e+01 2.801e+01 3.369e+01 5.530e+01, threshold=5.602e+01, percent-clipped=0.0 2024-08-12 06:57:26,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1512160.0, ans=0.0 2024-08-12 06:57:27,142 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-12 06:57:28,254 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6300, loss[loss=0.0921, beats_loss=0.01461, ecapa_loss=0.0001583, whisper_loss=0.07591, over 22357.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01114, ecapa_loss=0.0001795, whisper_loss=0.09322, over 3923263.67 frames. ], batch size: 92, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:57:28,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1512260.0, ans=0.2 2024-08-12 06:57:35,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1512260.0, ans=0.0 2024-08-12 06:57:37,889 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 06:57:53,990 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-12 06:57:54,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1512360.0, ans=0.125 2024-08-12 06:58:01,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1512460.0, ans=0.2 2024-08-12 06:58:09,709 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 06:58:23,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1512560.0, ans=0.125 2024-08-12 06:58:40,116 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6350, loss[loss=0.09938, beats_loss=0.01108, ecapa_loss=0.0001738, whisper_loss=0.08656, over 18957.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01118, ecapa_loss=0.0001819, whisper_loss=0.09244, over 3909369.93 frames. ], batch size: 77, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:58:44,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1512760.0, ans=0.2 2024-08-12 06:59:34,575 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 06:59:37,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1513060.0, ans=0.1 2024-08-12 06:59:42,582 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.576e+01 2.857e+01 3.160e+01 6.267e+01, threshold=5.713e+01, percent-clipped=1.0 2024-08-12 06:59:45,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1513160.0, ans=0.125 2024-08-12 06:59:53,757 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6400, loss[loss=0.1053, beats_loss=0.01136, ecapa_loss=0.0001625, whisper_loss=0.09236, over 19334.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01117, ecapa_loss=0.0001817, whisper_loss=0.092, over 3922350.64 frames. ], batch size: 77, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:59:54,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1513260.0, ans=0.125 2024-08-12 07:00:34,060 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2024-08-12 07:00:45,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1513560.0, ans=0.125 2024-08-12 07:00:56,252 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 07:01:04,015 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.52 vs. limit=22.5 2024-08-12 07:01:04,840 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 07:01:07,214 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6450, loss[loss=0.1097, beats_loss=0.01061, ecapa_loss=0.0001699, whisper_loss=0.09742, over 22646.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01114, ecapa_loss=0.0001814, whisper_loss=0.09293, over 3963674.19 frames. ], batch size: 88, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:01:20,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1513760.0, ans=0.125 2024-08-12 07:01:25,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1513860.0, ans=0.0 2024-08-12 07:01:32,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1513860.0, ans=0.07 2024-08-12 07:01:47,684 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 07:01:48,388 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.51 vs. limit=15.0 2024-08-12 07:02:10,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1514160.0, ans=0.0 2024-08-12 07:02:10,990 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.620e+01 2.898e+01 3.369e+01 4.608e+01, threshold=5.796e+01, percent-clipped=0.0 2024-08-12 07:02:21,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1514260.0, ans=0.125 2024-08-12 07:02:22,413 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6500, loss[loss=0.1019, beats_loss=0.01189, ecapa_loss=0.0002048, whisper_loss=0.08791, over 21812.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01111, ecapa_loss=0.0001817, whisper_loss=0.09295, over 3954763.95 frames. ], batch size: 92, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:02:39,749 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:02:49,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1514360.0, ans=0.0 2024-08-12 07:03:03,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=1514460.0, ans=0.1 2024-08-12 07:03:03,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1514460.0, ans=0.125 2024-08-12 07:03:08,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1514560.0, ans=0.125 2024-08-12 07:03:37,753 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6550, loss[loss=0.1235, beats_loss=0.008142, ecapa_loss=0.0001966, whisper_loss=0.1134, over 21880.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01104, ecapa_loss=0.0001818, whisper_loss=0.09345, over 3940344.84 frames. ], batch size: 88, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:03:43,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1514760.0, ans=0.0 2024-08-12 07:04:04,047 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 39 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 07:04:11,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1514960.0, ans=0.1 2024-08-12 07:04:14,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1514960.0, ans=0.0 2024-08-12 07:04:23,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1514960.0, ans=0.2 2024-08-12 07:04:38,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1515060.0, ans=0.025 2024-08-12 07:04:42,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1515060.0, ans=0.125 2024-08-12 07:04:47,678 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.605e+01 2.821e+01 3.389e+01 5.277e+01, threshold=5.643e+01, percent-clipped=0.0 2024-08-12 07:04:57,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1515160.0, ans=0.0 2024-08-12 07:05:01,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1515260.0, ans=0.125 2024-08-12 07:05:01,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1515260.0, ans=0.2 2024-08-12 07:05:02,226 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6600, loss[loss=0.0983, beats_loss=0.00857, ecapa_loss=0.0001857, whisper_loss=0.08788, over 14615.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01094, ecapa_loss=0.0001819, whisper_loss=0.09443, over 3954791.04 frames. ], batch size: 54, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:05:21,725 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-12 07:05:34,973 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 36 from Vox, 30 fro AS 2024-08-12 07:05:37,867 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 07:05:50,759 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 33 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 07:05:57,713 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-12 07:06:01,404 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2024-08-12 07:06:03,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1515560.0, ans=0.1 2024-08-12 07:06:22,839 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6650, loss[loss=0.1289, beats_loss=0.009442, ecapa_loss=0.0001761, whisper_loss=0.1177, over 19792.00 frames. ], tot_loss[loss=0.107, beats_loss=0.0109, ecapa_loss=0.0001825, whisper_loss=0.09423, over 3928083.32 frames. ], batch size: 76, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:06:48,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1515860.0, ans=0.125 2024-08-12 07:06:57,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1515860.0, ans=0.125 2024-08-12 07:06:59,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1515860.0, ans=10.0 2024-08-12 07:07:02,211 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 07:07:48,876 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.716e+01 3.038e+01 3.399e+01 5.348e+01, threshold=6.076e+01, percent-clipped=0.0 2024-08-12 07:07:53,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1516160.0, ans=0.125 2024-08-12 07:08:00,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1516160.0, ans=0.0 2024-08-12 07:08:06,458 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6700, loss[loss=0.1137, beats_loss=0.008815, ecapa_loss=0.0001632, whisper_loss=0.1033, over 20727.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01094, ecapa_loss=0.0001813, whisper_loss=0.09436, over 3960851.60 frames. ], batch size: 79, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:08:11,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=1516260.0, ans=10.0 2024-08-12 07:08:16,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1516260.0, ans=0.1 2024-08-12 07:08:25,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1516360.0, ans=0.1 2024-08-12 07:08:59,443 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 15 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 07:09:08,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1516560.0, ans=0.05 2024-08-12 07:09:13,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1516560.0, ans=0.95 2024-08-12 07:09:23,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1516660.0, ans=0.125 2024-08-12 07:09:27,085 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 07:09:29,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1516660.0, ans=0.1 2024-08-12 07:09:43,922 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6750, loss[loss=0.08979, beats_loss=0.01121, ecapa_loss=0.0002041, whisper_loss=0.07653, over 20387.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01096, ecapa_loss=0.0001813, whisper_loss=0.09317, over 3915098.60 frames. ], batch size: 84, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:10:10,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1516860.0, ans=0.2 2024-08-12 07:10:10,615 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2024-08-12 07:10:37,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1516960.0, ans=0.125 2024-08-12 07:10:43,931 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2024-08-12 07:11:07,101 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.593e+01 2.755e+01 3.178e+01 4.521e+01, threshold=5.509e+01, percent-clipped=0.0 2024-08-12 07:11:21,649 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 07:11:24,980 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6800, loss[loss=0.09049, beats_loss=0.01373, ecapa_loss=0.0001711, whisper_loss=0.07505, over 18801.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01095, ecapa_loss=0.0001801, whisper_loss=0.09346, over 3931451.16 frames. ], batch size: 79, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:11:44,155 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 07:11:49,821 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-12 07:11:51,018 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 07:11:57,521 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.74 vs. limit=15.0 2024-08-12 07:12:23,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1517560.0, ans=0.09899494936611666 2024-08-12 07:12:33,546 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 24 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-12 07:12:41,871 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6850, loss[loss=0.1052, beats_loss=0.01209, ecapa_loss=0.0001677, whisper_loss=0.09148, over 17076.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01101, ecapa_loss=0.0001793, whisper_loss=0.09319, over 3922566.83 frames. ], batch size: 69, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:12:49,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1517760.0, ans=0.0 2024-08-12 07:13:02,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1517860.0, ans=0.0 2024-08-12 07:13:05,797 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-12 07:13:08,579 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 07:13:11,489 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 21 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-12 07:13:18,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=1517960.0, ans=22.5 2024-08-12 07:13:20,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1517960.0, ans=0.125 2024-08-12 07:13:21,845 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 07:13:37,975 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=15.0 2024-08-12 07:13:42,172 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.608e+01 2.859e+01 3.356e+01 1.905e+02, threshold=5.718e+01, percent-clipped=1.0 2024-08-12 07:13:50,305 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 07:13:53,860 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6900, loss[loss=0.1093, beats_loss=0.0126, ecapa_loss=0.0001702, whisper_loss=0.09498, over 22657.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01107, ecapa_loss=0.0001788, whisper_loss=0.09339, over 3915114.22 frames. ], batch size: 90, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:13:59,926 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 07:14:02,386 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.83 vs. limit=10.0 2024-08-12 07:14:11,043 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 07:14:34,897 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 07:14:42,805 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=22.5 2024-08-12 07:14:42,906 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.10 vs. limit=15.0 2024-08-12 07:14:49,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1518560.0, ans=0.125 2024-08-12 07:14:50,277 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 38 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 07:15:05,760 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 6950, loss[loss=0.1103, beats_loss=0.01187, ecapa_loss=0.0001682, whisper_loss=0.09672, over 23223.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01118, ecapa_loss=0.0001777, whisper_loss=0.09301, over 3884618.66 frames. ], batch size: 93, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:15:13,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.22 vs. limit=6.0 2024-08-12 07:15:23,260 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.42 vs. limit=15.0 2024-08-12 07:15:40,539 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 07:15:40,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1518960.0, ans=0.0 2024-08-12 07:15:42,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1518960.0, ans=0.2 2024-08-12 07:16:04,114 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.463e+01 2.859e+01 3.118e+01 2.003e+02, threshold=5.718e+01, percent-clipped=2.0 2024-08-12 07:16:14,925 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7000, loss[loss=0.09631, beats_loss=0.01369, ecapa_loss=0.0001542, whisper_loss=0.08108, over 17226.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01125, ecapa_loss=0.0001788, whisper_loss=0.09225, over 3870586.00 frames. ], batch size: 70, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:16:21,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1519260.0, ans=0.125 2024-08-12 07:16:41,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1519360.0, ans=0.0 2024-08-12 07:16:56,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1519560.0, ans=0.125 2024-08-12 07:17:03,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1519560.0, ans=0.0 2024-08-12 07:17:08,016 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=15.0 2024-08-12 07:17:11,179 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.44 vs. limit=15.0 2024-08-12 07:17:11,735 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-12 07:17:22,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1519660.0, ans=0.1 2024-08-12 07:17:25,143 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7050, loss[loss=0.1017, beats_loss=0.01286, ecapa_loss=0.0001356, whisper_loss=0.0875, over 16761.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01122, ecapa_loss=0.0001789, whisper_loss=0.09206, over 3842962.75 frames. ], batch size: 64, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:17:26,850 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-12 07:17:44,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1519860.0, ans=0.0 2024-08-12 07:18:07,084 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.43 vs. limit=10.0 2024-08-12 07:18:09,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1519960.0, ans=0.125 2024-08-12 07:18:10,562 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 07:18:28,961 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.499e+01 2.770e+01 3.110e+01 4.662e+01, threshold=5.540e+01, percent-clipped=0.0 2024-08-12 07:18:29,170 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 07:18:36,190 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 11 from Vox, 42 fro AS 2024-08-12 07:18:37,651 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 07:18:38,533 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.19 vs. limit=12.0 2024-08-12 07:18:40,184 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7100, loss[loss=0.1146, beats_loss=0.009191, ecapa_loss=0.0002569, whisper_loss=0.1028, over 16523.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01126, ecapa_loss=0.0001771, whisper_loss=0.09157, over 3837204.62 frames. ], batch size: 70, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:18:45,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1520260.0, ans=0.125 2024-08-12 07:18:53,969 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2024-08-12 07:19:14,466 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 25 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-12 07:19:32,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1520560.0, ans=0.0 2024-08-12 07:19:35,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1520560.0, ans=0.125 2024-08-12 07:19:53,476 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 07:19:54,506 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7150, loss[loss=0.1222, beats_loss=0.009733, ecapa_loss=0.0002512, whisper_loss=0.11, over 20571.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01126, ecapa_loss=0.000175, whisper_loss=0.09136, over 3853975.06 frames. ], batch size: 88, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:20:05,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.87 vs. limit=15.0 2024-08-12 07:20:10,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1520860.0, ans=0.1 2024-08-12 07:20:25,751 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 07:20:40,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1521060.0, ans=0.125 2024-08-12 07:20:41,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1521060.0, ans=0.125 2024-08-12 07:20:42,220 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2024-08-12 07:20:45,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2024-08-12 07:20:55,730 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.570e+01 2.914e+01 3.125e+01 1.770e+02, threshold=5.828e+01, percent-clipped=1.0 2024-08-12 07:21:07,466 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7200, loss[loss=0.1152, beats_loss=0.01151, ecapa_loss=0.0001433, whisper_loss=0.1023, over 16814.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01122, ecapa_loss=0.0001745, whisper_loss=0.09212, over 3895323.60 frames. ], batch size: 64, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:21:16,272 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2024-08-12 07:21:40,690 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-12 07:21:57,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1521560.0, ans=0.0 2024-08-12 07:22:08,952 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 07:22:15,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1521660.0, ans=0.0 2024-08-12 07:22:22,203 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7250, loss[loss=0.1123, beats_loss=0.01191, ecapa_loss=0.0001774, whisper_loss=0.09859, over 22352.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0112, ecapa_loss=0.0001763, whisper_loss=0.09155, over 3891045.30 frames. ], batch size: 90, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:22:23,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1521760.0, ans=0.1 2024-08-12 07:22:41,061 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-12 07:22:48,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1521860.0, ans=0.0 2024-08-12 07:23:01,956 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.97 vs. limit=6.0 2024-08-12 07:23:19,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1522060.0, ans=0.0 2024-08-12 07:23:24,534 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.489e+01 2.804e+01 3.145e+01 4.718e+01, threshold=5.607e+01, percent-clipped=0.0 2024-08-12 07:23:30,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1522160.0, ans=0.1 2024-08-12 07:23:31,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1522160.0, ans=0.125 2024-08-12 07:23:36,439 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7300, loss[loss=0.1137, beats_loss=0.01101, ecapa_loss=0.0001716, whisper_loss=0.101, over 21168.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0112, ecapa_loss=0.000177, whisper_loss=0.09182, over 3899005.25 frames. ], batch size: 86, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:23:44,173 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 17 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 07:23:53,204 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.39 vs. limit=22.5 2024-08-12 07:24:02,113 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 19 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 07:24:17,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1522460.0, ans=0.125 2024-08-12 07:24:18,345 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-12 07:24:22,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1522560.0, ans=0.125 2024-08-12 07:24:27,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1522560.0, ans=0.125 2024-08-12 07:24:28,489 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 16 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 07:24:32,947 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 07:24:37,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1522660.0, ans=0.125 2024-08-12 07:24:38,474 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 07:24:46,010 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 07:24:49,932 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7350, loss[loss=0.09565, beats_loss=0.01236, ecapa_loss=0.0001522, whisper_loss=0.08176, over 15895.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01123, ecapa_loss=0.0001762, whisper_loss=0.09148, over 3890605.08 frames. ], batch size: 63, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:25:11,710 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.35 vs. limit=22.5 2024-08-12 07:25:33,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1523060.0, ans=0.2 2024-08-12 07:25:36,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1523060.0, ans=0.0 2024-08-12 07:25:44,347 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.47 vs. limit=10.0 2024-08-12 07:25:45,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1523060.0, ans=0.1 2024-08-12 07:25:52,391 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.567e+01 3.033e+01 3.476e+01 4.624e+01, threshold=6.066e+01, percent-clipped=0.0 2024-08-12 07:25:58,540 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 07:26:03,990 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7400, loss[loss=0.09343, beats_loss=0.01265, ecapa_loss=0.000184, whisper_loss=0.07894, over 21529.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01118, ecapa_loss=0.0001772, whisper_loss=0.09151, over 3912075.72 frames. ], batch size: 88, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:26:20,056 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 07:26:25,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1523360.0, ans=0.125 2024-08-12 07:26:28,178 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.64 vs. limit=22.5 2024-08-12 07:26:28,786 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 07:26:32,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1523460.0, ans=0.125 2024-08-12 07:26:33,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1523460.0, ans=0.125 2024-08-12 07:26:33,694 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.01 vs. limit=22.5 2024-08-12 07:26:41,462 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-08-12 07:26:48,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1523560.0, ans=0.125 2024-08-12 07:26:53,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1523560.0, ans=0.125 2024-08-12 07:26:58,024 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 16 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-12 07:27:07,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1523660.0, ans=0.1 2024-08-12 07:27:12,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1523660.0, ans=0.125 2024-08-12 07:27:17,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1523760.0, ans=0.0 2024-08-12 07:27:18,239 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7450, loss[loss=0.09596, beats_loss=0.0146, ecapa_loss=0.0001425, whisper_loss=0.07994, over 22642.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01125, ecapa_loss=0.0001783, whisper_loss=0.09075, over 3914001.15 frames. ], batch size: 92, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:27:23,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1523760.0, ans=0.0 2024-08-12 07:27:24,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1523760.0, ans=0.125 2024-08-12 07:27:28,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1523760.0, ans=0.1 2024-08-12 07:27:34,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.07 vs. limit=10.0 2024-08-12 07:27:42,439 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 07:27:47,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1523960.0, ans=0.125 2024-08-12 07:27:54,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1523960.0, ans=0.0 2024-08-12 07:27:56,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1523960.0, ans=0.1 2024-08-12 07:28:00,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1523960.0, ans=0.125 2024-08-12 07:28:01,528 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-12 07:28:20,895 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.662e+01 2.945e+01 3.324e+01 4.940e+01, threshold=5.890e+01, percent-clipped=0.0 2024-08-12 07:28:29,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1524160.0, ans=0.125 2024-08-12 07:28:31,834 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7500, loss[loss=0.1127, beats_loss=0.01184, ecapa_loss=0.0001743, whisper_loss=0.0991, over 22299.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01123, ecapa_loss=0.0001799, whisper_loss=0.09105, over 3901342.36 frames. ], batch size: 93, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:28:35,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1524260.0, ans=0.1 2024-08-12 07:28:41,062 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-12 07:28:44,170 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:28:52,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1524360.0, ans=0.2 2024-08-12 07:29:08,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1524460.0, ans=0.1 2024-08-12 07:29:16,618 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 07:29:17,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1524560.0, ans=22.5 2024-08-12 07:29:25,690 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.510e-01 2024-08-12 07:29:25,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1524560.0, ans=0.125 2024-08-12 07:29:34,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1524660.0, ans=0.125 2024-08-12 07:29:36,586 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 07:29:43,491 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7550, loss[loss=0.06949, beats_loss=0.01366, ecapa_loss=0.0001668, whisper_loss=0.05417, over 15629.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01125, ecapa_loss=0.0001813, whisper_loss=0.0909, over 3895085.75 frames. ], batch size: 64, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:29:46,718 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-12 07:30:09,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1524860.0, ans=0.125 2024-08-12 07:30:13,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1524960.0, ans=0.2 2024-08-12 07:30:21,572 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 07:30:21,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1524960.0, ans=0.0 2024-08-12 07:30:28,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1525060.0, ans=0.1 2024-08-12 07:30:46,304 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.510e+01 2.746e+01 3.098e+01 2.240e+02, threshold=5.492e+01, percent-clipped=2.0 2024-08-12 07:30:57,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1525260.0, ans=0.125 2024-08-12 07:30:57,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1525260.0, ans=0.0 2024-08-12 07:30:58,411 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7600, loss[loss=0.1174, beats_loss=0.009724, ecapa_loss=0.0001802, whisper_loss=0.1059, over 14392.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01114, ecapa_loss=0.0001813, whisper_loss=0.09174, over 3881982.87 frames. ], batch size: 55, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:31:03,968 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 07:31:13,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1525360.0, ans=0.125 2024-08-12 07:31:16,211 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 07:31:28,254 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-12 07:31:42,355 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.77 vs. limit=15.0 2024-08-12 07:31:46,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1525560.0, ans=0.125 2024-08-12 07:31:46,720 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.70 vs. limit=10.0 2024-08-12 07:31:57,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1525660.0, ans=0.125 2024-08-12 07:32:01,975 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-12 07:32:12,200 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7650, loss[loss=0.1164, beats_loss=0.01191, ecapa_loss=0.0001591, whisper_loss=0.1029, over 23334.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01108, ecapa_loss=0.0001818, whisper_loss=0.09174, over 3869755.90 frames. ], batch size: 91, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:32:15,371 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-12 07:32:20,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1525760.0, ans=0.125 2024-08-12 07:32:35,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1525860.0, ans=0.125 2024-08-12 07:32:40,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1525960.0, ans=15.0 2024-08-12 07:33:13,025 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.517e+01 2.819e+01 3.143e+01 1.705e+02, threshold=5.638e+01, percent-clipped=1.0 2024-08-12 07:33:23,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1526160.0, ans=0.0 2024-08-12 07:33:25,140 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7700, loss[loss=0.1143, beats_loss=0.009039, ecapa_loss=0.0001895, whisper_loss=0.1033, over 17443.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01103, ecapa_loss=0.0001809, whisper_loss=0.09202, over 3893290.58 frames. ], batch size: 67, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:33:26,957 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 21 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-12 07:33:32,668 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-12 07:33:35,576 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-12 07:33:46,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1526360.0, ans=0.0 2024-08-12 07:33:54,239 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 07:33:56,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1526460.0, ans=0.5 2024-08-12 07:33:59,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1526460.0, ans=0.125 2024-08-12 07:34:31,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1526660.0, ans=0.0 2024-08-12 07:34:41,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1526760.0, ans=0.0 2024-08-12 07:34:42,731 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7750, loss[loss=0.09635, beats_loss=0.01049, ecapa_loss=0.0001856, whisper_loss=0.084, over 20876.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01108, ecapa_loss=0.0001803, whisper_loss=0.09163, over 3883340.09 frames. ], batch size: 86, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:34:47,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1526760.0, ans=0.0 2024-08-12 07:34:57,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1526860.0, ans=0.95 2024-08-12 07:35:37,451 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 07:35:44,883 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.468e+01 2.726e+01 3.182e+01 4.341e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-12 07:35:56,398 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7800, loss[loss=0.1119, beats_loss=0.01005, ecapa_loss=0.000167, whisper_loss=0.1002, over 22273.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0111, ecapa_loss=0.0001793, whisper_loss=0.09184, over 3909270.79 frames. ], batch size: 87, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:35:59,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1527260.0, ans=0.1 2024-08-12 07:36:11,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1527360.0, ans=0.1 2024-08-12 07:36:19,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1527360.0, ans=0.125 2024-08-12 07:36:21,606 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-12 07:36:25,470 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.72 vs. limit=22.5 2024-08-12 07:36:28,179 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.84 vs. limit=10.0 2024-08-12 07:37:04,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1527660.0, ans=0.0 2024-08-12 07:37:09,954 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7850, loss[loss=0.08626, beats_loss=0.01352, ecapa_loss=0.0001368, whisper_loss=0.07137, over 16927.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01116, ecapa_loss=0.00018, whisper_loss=0.09172, over 3892750.40 frames. ], batch size: 68, lr: 5.73e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:37:26,532 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-12 07:37:28,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1527860.0, ans=0.125 2024-08-12 07:37:30,061 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 07:37:33,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1527860.0, ans=0.2 2024-08-12 07:37:45,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1527960.0, ans=0.125 2024-08-12 07:37:51,194 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 07:38:04,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1528060.0, ans=0.125 2024-08-12 07:38:04,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1528060.0, ans=0.2 2024-08-12 07:38:06,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1528060.0, ans=0.125 2024-08-12 07:38:13,760 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.514e+01 2.915e+01 3.388e+01 6.482e+01, threshold=5.829e+01, percent-clipped=1.0 2024-08-12 07:38:16,678 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 07:38:25,062 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7900, loss[loss=0.1138, beats_loss=0.01098, ecapa_loss=0.0001651, whisper_loss=0.1012, over 22690.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01114, ecapa_loss=0.0001789, whisper_loss=0.09261, over 3909818.66 frames. ], batch size: 90, lr: 5.73e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:38:35,192 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 07:39:00,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1528460.0, ans=0.0 2024-08-12 07:39:00,375 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.87 vs. limit=22.5 2024-08-12 07:39:04,635 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 07:39:08,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1528560.0, ans=0.1 2024-08-12 07:39:19,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1528560.0, ans=0.125 2024-08-12 07:39:37,937 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 7950, loss[loss=0.1264, beats_loss=0.00968, ecapa_loss=0.0002041, whisper_loss=0.1147, over 24058.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01116, ecapa_loss=0.0001794, whisper_loss=0.09237, over 3883620.55 frames. ], batch size: 93, lr: 5.73e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:39:45,284 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 07:39:51,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1528860.0, ans=0.125 2024-08-12 07:39:59,323 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2024-08-12 07:40:04,013 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.55 vs. limit=22.5 2024-08-12 07:40:40,509 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+01 2.548e+01 3.030e+01 3.373e+01 4.598e+01, threshold=6.060e+01, percent-clipped=0.0 2024-08-12 07:40:52,151 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8000, loss[loss=0.07608, beats_loss=0.01413, ecapa_loss=0.000163, whisper_loss=0.06031, over 17725.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01116, ecapa_loss=0.0001792, whisper_loss=0.09239, over 3903415.15 frames. ], batch size: 73, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:40:55,809 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 07:40:58,760 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 07:40:59,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1529260.0, ans=0.125 2024-08-12 07:41:02,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1529260.0, ans=0.0 2024-08-12 07:41:33,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1529460.0, ans=0.04949747468305833 2024-08-12 07:41:39,768 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 35 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-12 07:41:39,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=1529560.0, ans=10.0 2024-08-12 07:41:49,118 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-12 07:41:55,734 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-12 07:42:01,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1529660.0, ans=0.1 2024-08-12 07:42:04,683 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.276e+05 2024-08-12 07:42:07,776 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8050, loss[loss=0.1277, beats_loss=0.009559, ecapa_loss=0.0001646, whisper_loss=0.1165, over 20740.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01111, ecapa_loss=0.0001793, whisper_loss=0.09337, over 3898792.91 frames. ], batch size: 78, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:42:12,342 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 07:42:24,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1529860.0, ans=0.0 2024-08-12 07:42:24,555 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2024-08-12 07:42:41,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1529960.0, ans=0.125 2024-08-12 07:42:43,942 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-08-12 07:42:50,124 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 31 from Vox, 26 fro AS 2024-08-12 07:42:54,447 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 07:42:57,182 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 07:43:08,886 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.431e+01 2.661e+01 3.080e+01 6.684e+01, threshold=5.323e+01, percent-clipped=1.0 2024-08-12 07:43:13,524 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 07:43:21,032 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8100, loss[loss=0.1027, beats_loss=0.01136, ecapa_loss=0.0001946, whisper_loss=0.08935, over 21797.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01111, ecapa_loss=0.0001799, whisper_loss=0.093, over 3867535.35 frames. ], batch size: 91, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:43:34,292 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 07:43:41,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2024-08-12 07:43:54,821 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 07:44:03,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1530460.0, ans=0.5 2024-08-12 07:44:13,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1530560.0, ans=0.09899494936611666 2024-08-12 07:44:19,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1530560.0, ans=0.125 2024-08-12 07:44:26,370 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 21 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-12 07:44:31,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1530660.0, ans=0.0 2024-08-12 07:44:37,199 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8150, loss[loss=0.1201, beats_loss=0.01209, ecapa_loss=0.0001514, whisper_loss=0.1065, over 22770.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01107, ecapa_loss=0.0001802, whisper_loss=0.09291, over 3837929.13 frames. ], batch size: 89, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:44:47,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1530760.0, ans=0.0 2024-08-12 07:44:49,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1530760.0, ans=0.0 2024-08-12 07:44:53,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1530860.0, ans=0.0 2024-08-12 07:44:57,725 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 07:45:13,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1530960.0, ans=0.125 2024-08-12 07:45:32,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1531060.0, ans=0.09899494936611666 2024-08-12 07:45:33,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1531060.0, ans=0.0 2024-08-12 07:45:36,179 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-12 07:45:38,338 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.559e+01 2.858e+01 3.192e+01 6.698e+01, threshold=5.715e+01, percent-clipped=1.0 2024-08-12 07:45:44,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1531160.0, ans=0.0 2024-08-12 07:45:50,518 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8200, loss[loss=0.1187, beats_loss=0.01099, ecapa_loss=0.0002302, whisper_loss=0.1054, over 22057.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01111, ecapa_loss=0.0001817, whisper_loss=0.09209, over 3870462.49 frames. ], batch size: 92, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:45:50,677 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 17 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 07:47:01,860 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8250, loss[loss=0.1041, beats_loss=0.01233, ecapa_loss=0.0001709, whisper_loss=0.09002, over 21689.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01115, ecapa_loss=0.0001812, whisper_loss=0.09167, over 3872341.11 frames. ], batch size: 89, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:47:15,032 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 15 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 07:47:15,492 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.931e+00 2024-08-12 07:47:16,502 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-12 07:47:20,142 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=22.5 2024-08-12 07:47:24,081 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-12 07:47:25,815 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 07:47:31,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1531960.0, ans=0.0 2024-08-12 07:47:36,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1531960.0, ans=0.0 2024-08-12 07:47:47,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1532060.0, ans=0.0 2024-08-12 07:47:48,764 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 07:47:52,957 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 07:48:03,985 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.593e+01 2.850e+01 3.386e+01 5.334e+01, threshold=5.700e+01, percent-clipped=0.0 2024-08-12 07:48:05,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1532160.0, ans=0.1 2024-08-12 07:48:14,120 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8300, loss[loss=0.09039, beats_loss=0.01243, ecapa_loss=0.0001875, whisper_loss=0.07608, over 22133.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01113, ecapa_loss=0.0001811, whisper_loss=0.09152, over 3861692.33 frames. ], batch size: 94, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:48:22,582 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 07:48:31,587 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.33 vs. limit=12.0 2024-08-12 07:48:44,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1532460.0, ans=0.0 2024-08-12 07:48:54,630 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 07:49:00,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1532560.0, ans=0.2 2024-08-12 07:49:02,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.46 vs. limit=15.0 2024-08-12 07:49:14,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1532660.0, ans=0.125 2024-08-12 07:49:19,884 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:49:23,090 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8350, loss[loss=0.1104, beats_loss=0.01023, ecapa_loss=0.0001883, whisper_loss=0.09826, over 22227.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0112, ecapa_loss=0.0001808, whisper_loss=0.09082, over 3875899.56 frames. ], batch size: 90, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:49:23,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1532760.0, ans=0.125 2024-08-12 07:49:23,640 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:49:26,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1532760.0, ans=0.125 2024-08-12 07:49:32,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1532760.0, ans=0.05 2024-08-12 07:49:32,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1532760.0, ans=0.125 2024-08-12 07:49:33,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1532760.0, ans=0.0 2024-08-12 07:50:04,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1533060.0, ans=0.125 2024-08-12 07:50:17,531 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.46 vs. limit=22.5 2024-08-12 07:50:23,755 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.502e+01 2.920e+01 3.300e+01 7.763e+01, threshold=5.841e+01, percent-clipped=2.0 2024-08-12 07:50:28,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1533160.0, ans=0.125 2024-08-12 07:50:32,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1533260.0, ans=0.0 2024-08-12 07:50:33,512 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8400, loss[loss=0.08808, beats_loss=0.01229, ecapa_loss=0.0001615, whisper_loss=0.07417, over 14107.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01109, ecapa_loss=0.0001822, whisper_loss=0.09078, over 3853212.87 frames. ], batch size: 57, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:50:34,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1533260.0, ans=0.0 2024-08-12 07:50:35,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1533260.0, ans=0.025 2024-08-12 07:50:40,266 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:51:12,864 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 07:51:40,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1533660.0, ans=0.5 2024-08-12 07:51:45,275 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8450, loss[loss=0.09957, beats_loss=0.01006, ecapa_loss=0.000175, whisper_loss=0.08776, over 15797.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01105, ecapa_loss=0.0001821, whisper_loss=0.09143, over 3849054.62 frames. ], batch size: 61, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:51:52,568 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 07:52:18,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1533960.0, ans=0.125 2024-08-12 07:52:31,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1534060.0, ans=0.0 2024-08-12 07:52:37,011 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 07:52:42,719 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 07:52:46,733 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.460e+01 2.717e+01 3.180e+01 4.918e+01, threshold=5.434e+01, percent-clipped=0.0 2024-08-12 07:52:56,121 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8500, loss[loss=0.08355, beats_loss=0.01425, ecapa_loss=0.0001354, whisper_loss=0.06794, over 16743.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01104, ecapa_loss=0.0001813, whisper_loss=0.09183, over 3897234.95 frames. ], batch size: 65, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:53:05,394 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:53:05,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1534260.0, ans=0.05 2024-08-12 07:53:06,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1534260.0, ans=0.0 2024-08-12 07:53:09,466 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 07:53:09,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1534360.0, ans=0.0 2024-08-12 07:53:15,214 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 07:53:23,746 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-12 07:53:27,777 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 33 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 07:53:30,867 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 07:53:36,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1534460.0, ans=0.0 2024-08-12 07:53:37,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1534560.0, ans=0.1 2024-08-12 07:53:46,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1534560.0, ans=0.125 2024-08-12 07:53:56,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1534660.0, ans=0.0 2024-08-12 07:54:03,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1534660.0, ans=0.07 2024-08-12 07:54:07,499 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8550, loss[loss=0.1256, beats_loss=0.00956, ecapa_loss=0.0002186, whisper_loss=0.1138, over 22513.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01092, ecapa_loss=0.000181, whisper_loss=0.09309, over 3906731.52 frames. ], batch size: 88, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:54:11,903 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 07:54:25,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1534860.0, ans=0.125 2024-08-12 07:54:38,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1534960.0, ans=0.125 2024-08-12 07:55:09,369 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.569e+01 2.940e+01 3.392e+01 6.119e+01, threshold=5.880e+01, percent-clipped=2.0 2024-08-12 07:55:10,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1535160.0, ans=0.1 2024-08-12 07:55:12,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1535160.0, ans=0.125 2024-08-12 07:55:19,408 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8600, loss[loss=0.109, beats_loss=0.00847, ecapa_loss=0.0002509, whisper_loss=0.09799, over 14417.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01095, ecapa_loss=0.0001817, whisper_loss=0.09253, over 3885838.00 frames. ], batch size: 59, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:55:49,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1535460.0, ans=0.125 2024-08-12 07:55:50,234 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.52 vs. limit=12.0 2024-08-12 07:55:54,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1535460.0, ans=0.1 2024-08-12 07:56:31,649 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8650, loss[loss=0.08587, beats_loss=0.01478, ecapa_loss=0.0001526, whisper_loss=0.06956, over 21926.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01102, ecapa_loss=0.0001812, whisper_loss=0.09225, over 3889225.64 frames. ], batch size: 91, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:56:48,738 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.43 vs. limit=22.5 2024-08-12 07:57:05,427 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 07:57:24,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1536060.0, ans=0.2 2024-08-12 07:57:29,726 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.80 vs. limit=22.5 2024-08-12 07:57:33,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1536160.0, ans=0.0 2024-08-12 07:57:34,577 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.601e+01 2.885e+01 3.263e+01 5.509e+01, threshold=5.770e+01, percent-clipped=0.0 2024-08-12 07:57:34,855 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 07:57:42,072 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-12 07:57:44,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1536260.0, ans=0.0 2024-08-12 07:57:45,123 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8700, loss[loss=0.07993, beats_loss=0.01516, ecapa_loss=0.0001434, whisper_loss=0.06334, over 16868.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01105, ecapa_loss=0.0001793, whisper_loss=0.09218, over 3889133.76 frames. ], batch size: 67, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:57:46,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1536260.0, ans=0.125 2024-08-12 07:57:58,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1536360.0, ans=0.125 2024-08-12 07:58:04,612 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2024-08-12 07:58:11,927 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-08-12 07:58:17,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1536460.0, ans=0.0 2024-08-12 07:58:20,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1536460.0, ans=0.125 2024-08-12 07:58:34,876 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 07:58:57,670 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8750, loss[loss=0.1332, beats_loss=0.008645, ecapa_loss=0.000163, whisper_loss=0.1229, over 17095.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01109, ecapa_loss=0.0001792, whisper_loss=0.0916, over 3869537.50 frames. ], batch size: 66, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:59:05,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1536760.0, ans=0.125 2024-08-12 07:59:09,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1536760.0, ans=0.125 2024-08-12 07:59:09,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1536760.0, ans=0.125 2024-08-12 07:59:11,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1536860.0, ans=0.2 2024-08-12 07:59:17,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1536860.0, ans=0.125 2024-08-12 07:59:22,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1536860.0, ans=0.05 2024-08-12 07:59:35,824 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=12.0 2024-08-12 07:59:36,692 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-12 07:59:39,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1537060.0, ans=0.125 2024-08-12 07:59:45,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1537060.0, ans=0.125 2024-08-12 07:59:59,286 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.459e+01 2.752e+01 3.200e+01 4.704e+01, threshold=5.505e+01, percent-clipped=0.0 2024-08-12 08:00:01,254 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 20 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-12 08:00:09,636 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8800, loss[loss=0.09862, beats_loss=0.01177, ecapa_loss=0.0001835, whisper_loss=0.08501, over 20690.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01114, ecapa_loss=0.0001788, whisper_loss=0.09124, over 3879450.30 frames. ], batch size: 81, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:00:24,867 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2024-08-12 08:00:30,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1537360.0, ans=0.125 2024-08-12 08:00:30,367 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=12.0 2024-08-12 08:00:30,508 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2024-08-12 08:00:34,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1537360.0, ans=0.125 2024-08-12 08:00:35,326 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 32 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 08:00:38,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1537460.0, ans=0.125 2024-08-12 08:00:50,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1537460.0, ans=0.0 2024-08-12 08:00:50,501 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=12.0 2024-08-12 08:01:11,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1537660.0, ans=0.0 2024-08-12 08:01:16,901 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 08:01:21,884 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.60 vs. limit=22.5 2024-08-12 08:01:22,446 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8850, loss[loss=0.07515, beats_loss=0.01031, ecapa_loss=0.0001884, whisper_loss=0.06295, over 18044.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01113, ecapa_loss=0.000178, whisper_loss=0.0914, over 3902125.58 frames. ], batch size: 72, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:01:28,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1537760.0, ans=0.1 2024-08-12 08:01:40,632 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 08:01:43,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1537860.0, ans=0.125 2024-08-12 08:01:49,301 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 08:02:00,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1537960.0, ans=0.2 2024-08-12 08:02:18,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1538060.0, ans=0.125 2024-08-12 08:02:26,643 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.513e+01 2.817e+01 3.159e+01 3.465e+02, threshold=5.633e+01, percent-clipped=4.0 2024-08-12 08:02:27,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1538160.0, ans=0.125 2024-08-12 08:02:36,975 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8900, loss[loss=0.1216, beats_loss=0.00978, ecapa_loss=0.0002118, whisper_loss=0.1097, over 17450.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01112, ecapa_loss=0.0001786, whisper_loss=0.09166, over 3876743.76 frames. ], batch size: 71, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:02:37,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1538260.0, ans=0.2 2024-08-12 08:03:02,879 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.45 vs. limit=15.0 2024-08-12 08:03:13,930 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 08:03:40,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1538660.0, ans=0.07 2024-08-12 08:03:43,527 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 08:03:50,462 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 8950, loss[loss=0.09921, beats_loss=0.009532, ecapa_loss=0.0001701, whisper_loss=0.08798, over 15062.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01108, ecapa_loss=0.0001792, whisper_loss=0.09231, over 3881322.70 frames. ], batch size: 56, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:04:04,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1538860.0, ans=0.125 2024-08-12 08:04:12,471 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-12 08:04:52,653 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.556e+01 2.825e+01 3.281e+01 7.768e+01, threshold=5.651e+01, percent-clipped=1.0 2024-08-12 08:05:02,502 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9000, loss[loss=0.09024, beats_loss=0.01413, ecapa_loss=0.000122, whisper_loss=0.07489, over 18984.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01124, ecapa_loss=0.000179, whisper_loss=0.09114, over 3879052.28 frames. ], batch size: 73, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:05:02,504 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 08:05:41,673 INFO [train_multi_KD3.py:1149] (2/4) Epoch 11, validation on ASR_libri: loss=0.2556, beats_loss=0, ecapa_loss=0.0006109, whisper_loss=0.2495, over 922467.00 frames. 2024-08-12 08:05:59,659 INFO [train_multi_KD3.py:1149] (2/4) Epoch 11, validation on SV_voxceleb1: loss=0.004943, beats_loss=0, ecapa_loss=0.0004943, whisper_loss=0, over 939242.00 frames. 2024-08-12 08:07:53,009 INFO [train_multi_KD3.py:1149] (2/4) Epoch 11, validation on AT_audioset: loss=0.02436, beats_loss=0.02436, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 08:07:53,013 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 08:08:04,995 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 08:08:16,325 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-12 08:08:24,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1539460.0, ans=0.1 2024-08-12 08:08:29,778 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 08:08:34,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1539460.0, ans=0.125 2024-08-12 08:08:35,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1539560.0, ans=0.0 2024-08-12 08:08:47,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1539560.0, ans=0.125 2024-08-12 08:08:50,245 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 08:09:05,581 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9050, loss[loss=0.08641, beats_loss=0.01175, ecapa_loss=0.0001948, whisper_loss=0.07271, over 16590.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01112, ecapa_loss=0.0001795, whisper_loss=0.0923, over 3878618.28 frames. ], batch size: 67, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:09:06,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1539760.0, ans=0.1 2024-08-12 08:09:09,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1539760.0, ans=0.1 2024-08-12 08:09:19,342 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 08:09:40,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1539960.0, ans=0.0 2024-08-12 08:09:51,003 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.97 vs. limit=10.0 2024-08-12 08:09:56,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1540060.0, ans=0.09899494936611666 2024-08-12 08:10:05,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1540160.0, ans=0.1 2024-08-12 08:10:09,422 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.551e+01 2.907e+01 3.420e+01 5.824e+01, threshold=5.813e+01, percent-clipped=1.0 2024-08-12 08:10:16,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1540160.0, ans=0.125 2024-08-12 08:10:17,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1540160.0, ans=0.125 2024-08-12 08:10:19,748 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9100, loss[loss=0.1178, beats_loss=0.01181, ecapa_loss=0.0002321, whisper_loss=0.1037, over 20747.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01106, ecapa_loss=0.0001813, whisper_loss=0.09208, over 3853451.28 frames. ], batch size: 89, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:10:43,006 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.45 vs. limit=15.0 2024-08-12 08:10:43,496 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 08:10:55,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=1540460.0, ans=0.02 2024-08-12 08:10:57,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1540460.0, ans=0.0 2024-08-12 08:11:13,386 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=26.53 vs. limit=22.5 2024-08-12 08:11:14,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1540560.0, ans=0.125 2024-08-12 08:11:20,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1540660.0, ans=0.95 2024-08-12 08:11:26,102 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 08:11:33,179 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9150, loss[loss=0.1039, beats_loss=0.01332, ecapa_loss=0.0001747, whisper_loss=0.08882, over 22231.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01103, ecapa_loss=0.0001802, whisper_loss=0.09265, over 3874327.30 frames. ], batch size: 92, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:11:33,378 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 17 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 08:11:40,726 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 14 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-12 08:12:00,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1540860.0, ans=0.5 2024-08-12 08:12:02,819 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 08:12:03,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1540960.0, ans=0.125 2024-08-12 08:12:35,885 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.613e+01 2.813e+01 3.154e+01 4.389e+01, threshold=5.626e+01, percent-clipped=0.0 2024-08-12 08:12:41,474 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2024-08-12 08:12:46,204 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9200, loss[loss=0.1288, beats_loss=0.009459, ecapa_loss=0.0001469, whisper_loss=0.1178, over 19206.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01105, ecapa_loss=0.00018, whisper_loss=0.09243, over 3896261.03 frames. ], batch size: 71, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:12:50,798 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 08:12:56,621 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-12 08:13:00,701 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 08:13:06,436 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 08:13:14,538 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-12 08:13:36,806 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 14 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-12 08:13:55,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1541660.0, ans=0.0 2024-08-12 08:13:57,694 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9250, loss[loss=0.1015, beats_loss=0.01036, ecapa_loss=0.000203, whisper_loss=0.08909, over 20284.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01114, ecapa_loss=0.0001787, whisper_loss=0.09156, over 3878725.25 frames. ], batch size: 84, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:13:59,312 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 08:14:02,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1541760.0, ans=0.025 2024-08-12 08:14:03,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1541760.0, ans=0.125 2024-08-12 08:14:11,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1541860.0, ans=0.125 2024-08-12 08:14:34,313 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 08:14:35,734 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 25 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-12 08:14:38,441 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-12 08:14:46,984 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 08:14:47,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1542060.0, ans=0.1 2024-08-12 08:14:48,480 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 08:14:54,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1542060.0, ans=0.125 2024-08-12 08:14:58,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1542160.0, ans=0.125 2024-08-12 08:14:58,712 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.06 vs. limit=22.5 2024-08-12 08:15:00,716 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.527e+01 2.843e+01 3.278e+01 5.057e+01, threshold=5.687e+01, percent-clipped=0.0 2024-08-12 08:15:03,032 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2024-08-12 08:15:07,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1542160.0, ans=0.2 2024-08-12 08:15:07,641 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.26 vs. limit=15.0 2024-08-12 08:15:09,853 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 08:15:10,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1542260.0, ans=0.125 2024-08-12 08:15:11,003 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9300, loss[loss=0.12, beats_loss=0.01134, ecapa_loss=0.0001675, whisper_loss=0.107, over 22544.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01111, ecapa_loss=0.0001787, whisper_loss=0.09152, over 3864629.76 frames. ], batch size: 88, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:15:18,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1542260.0, ans=0.0 2024-08-12 08:15:33,046 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 08:15:35,943 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 08:15:37,606 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 08:15:47,972 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 08:15:50,235 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.99 vs. limit=22.5 2024-08-12 08:16:26,544 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9350, loss[loss=0.0887, beats_loss=0.01191, ecapa_loss=0.0001736, whisper_loss=0.07505, over 17388.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01112, ecapa_loss=0.0001791, whisper_loss=0.09094, over 3855184.04 frames. ], batch size: 70, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:16:34,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1542760.0, ans=0.125 2024-08-12 08:16:34,793 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2024-08-12 08:16:36,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1542760.0, ans=0.2 2024-08-12 08:16:42,074 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 08:16:42,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1542860.0, ans=0.0 2024-08-12 08:16:53,325 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=15.0 2024-08-12 08:16:54,470 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 34 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 08:17:19,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.54 vs. limit=10.0 2024-08-12 08:17:21,334 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 08:17:31,953 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.553e+01 2.819e+01 3.364e+01 6.243e+01, threshold=5.639e+01, percent-clipped=2.0 2024-08-12 08:17:32,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1543160.0, ans=0.2 2024-08-12 08:17:43,110 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9400, loss[loss=0.1144, beats_loss=0.009151, ecapa_loss=0.0001701, whisper_loss=0.1035, over 22226.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01109, ecapa_loss=0.0001785, whisper_loss=0.09158, over 3850930.42 frames. ], batch size: 84, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:17:58,213 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-12 08:18:04,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1543360.0, ans=0.04949747468305833 2024-08-12 08:18:13,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1543460.0, ans=0.0 2024-08-12 08:18:30,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1543560.0, ans=0.1 2024-08-12 08:18:35,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1543560.0, ans=0.1 2024-08-12 08:18:36,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1543560.0, ans=0.2 2024-08-12 08:18:48,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1543660.0, ans=0.2 2024-08-12 08:18:52,692 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-12 08:18:58,827 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9450, loss[loss=0.09043, beats_loss=0.009312, ecapa_loss=0.0002097, whisper_loss=0.07903, over 17682.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01113, ecapa_loss=0.0001803, whisper_loss=0.09151, over 3860100.46 frames. ], batch size: 74, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:19:08,337 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 14 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 08:19:13,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1543860.0, ans=0.0 2024-08-12 08:19:37,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1543960.0, ans=0.125 2024-08-12 08:19:40,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1543960.0, ans=0.1 2024-08-12 08:19:46,695 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.02 vs. limit=22.5 2024-08-12 08:19:47,423 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 21 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-12 08:19:52,273 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 08:19:52,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1544060.0, ans=0.125 2024-08-12 08:19:52,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1544060.0, ans=0.125 2024-08-12 08:20:00,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1544160.0, ans=0.125 2024-08-12 08:20:01,425 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.483e+01 2.821e+01 3.317e+01 4.965e+01, threshold=5.642e+01, percent-clipped=0.0 2024-08-12 08:20:11,656 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9500, loss[loss=0.1107, beats_loss=0.009712, ecapa_loss=0.0001652, whisper_loss=0.09932, over 17167.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01103, ecapa_loss=0.0001821, whisper_loss=0.09133, over 3825421.27 frames. ], batch size: 64, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:20:24,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1544360.0, ans=0.0 2024-08-12 08:21:24,203 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9550, loss[loss=0.09181, beats_loss=0.01208, ecapa_loss=0.0001691, whisper_loss=0.07804, over 21982.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01105, ecapa_loss=0.0001819, whisper_loss=0.09133, over 3835977.16 frames. ], batch size: 92, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:21:42,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=1544860.0, ans=0.1 2024-08-12 08:21:44,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1544860.0, ans=0.0 2024-08-12 08:22:05,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1544960.0, ans=0.0 2024-08-12 08:22:11,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1545060.0, ans=0.2 2024-08-12 08:22:26,568 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.037e+01 2.581e+01 2.910e+01 3.415e+01 4.856e+01, threshold=5.819e+01, percent-clipped=0.0 2024-08-12 08:22:36,586 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9600, loss[loss=0.09801, beats_loss=0.01247, ecapa_loss=0.0001851, whisper_loss=0.08368, over 21998.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01102, ecapa_loss=0.0001813, whisper_loss=0.09144, over 3831212.84 frames. ], batch size: 92, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:22:47,009 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-12 08:22:47,774 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=12.0 2024-08-12 08:23:10,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1545460.0, ans=0.125 2024-08-12 08:23:12,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1545460.0, ans=0.0 2024-08-12 08:23:17,236 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 08:23:33,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1545660.0, ans=0.125 2024-08-12 08:23:35,894 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 22 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 08:23:37,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1545660.0, ans=0.125 2024-08-12 08:23:44,944 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-12 08:23:49,013 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2024-08-12 08:23:49,505 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9650, loss[loss=0.08604, beats_loss=0.01337, ecapa_loss=0.0001676, whisper_loss=0.07099, over 22228.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01101, ecapa_loss=0.0001806, whisper_loss=0.09138, over 3855295.32 frames. ], batch size: 91, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:23:49,699 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 08:23:51,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1545760.0, ans=0.125 2024-08-12 08:23:52,429 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 08:23:55,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1545760.0, ans=0.0 2024-08-12 08:23:58,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1545760.0, ans=0.1 2024-08-12 08:24:06,758 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-12 08:24:11,465 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-08-12 08:24:35,356 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 28 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-12 08:24:37,351 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.26 vs. limit=15.0 2024-08-12 08:24:41,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1546060.0, ans=0.2 2024-08-12 08:24:50,362 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.490e+01 2.776e+01 3.280e+01 4.565e+01, threshold=5.551e+01, percent-clipped=0.0 2024-08-12 08:24:58,077 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 08:25:00,793 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9700, loss[loss=0.1144, beats_loss=0.008644, ecapa_loss=0.0001696, whisper_loss=0.104, over 18939.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01091, ecapa_loss=0.000182, whisper_loss=0.09198, over 3869788.52 frames. ], batch size: 71, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:25:01,625 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=12.0 2024-08-12 08:25:04,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1546260.0, ans=0.125 2024-08-12 08:25:42,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1546560.0, ans=0.2 2024-08-12 08:25:43,628 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 30 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 08:26:05,324 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.06 vs. limit=15.0 2024-08-12 08:26:13,099 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9750, loss[loss=0.1163, beats_loss=0.01033, ecapa_loss=0.0001829, whisper_loss=0.1041, over 22330.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01086, ecapa_loss=0.0001804, whisper_loss=0.09257, over 3838691.18 frames. ], batch size: 92, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:26:13,601 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-12 08:26:16,201 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 08:26:28,902 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-12 08:26:29,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.86 vs. limit=8.0 2024-08-12 08:26:30,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1546860.0, ans=0.125 2024-08-12 08:26:47,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1546960.0, ans=0.1 2024-08-12 08:26:47,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1546960.0, ans=0.0 2024-08-12 08:27:16,898 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.437e+01 2.801e+01 3.445e+01 6.244e+01, threshold=5.602e+01, percent-clipped=1.0 2024-08-12 08:27:18,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1547160.0, ans=0.0 2024-08-12 08:27:21,522 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 12 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 08:27:25,291 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.25 vs. limit=22.5 2024-08-12 08:27:27,243 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9800, loss[loss=0.1106, beats_loss=0.008787, ecapa_loss=0.0001658, whisper_loss=0.1002, over 18497.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01098, ecapa_loss=0.0001789, whisper_loss=0.09229, over 3844914.00 frames. ], batch size: 69, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:27:48,148 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 08:27:52,342 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.97 vs. limit=22.5 2024-08-12 08:27:54,410 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 08:28:39,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1547660.0, ans=0.125 2024-08-12 08:28:42,413 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9850, loss[loss=0.1115, beats_loss=0.009681, ecapa_loss=0.0001786, whisper_loss=0.1, over 23382.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.011, ecapa_loss=0.0001806, whisper_loss=0.09238, over 3879014.45 frames. ], batch size: 91, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:28:44,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1547760.0, ans=0.0 2024-08-12 08:28:51,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1547760.0, ans=0.0 2024-08-12 08:29:14,346 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 08:29:16,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1547960.0, ans=0.0 2024-08-12 08:29:23,337 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 11 from Vox, 46 fro AS 2024-08-12 08:29:43,427 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 08:29:47,662 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.546e+01 2.857e+01 3.272e+01 5.247e+01, threshold=5.713e+01, percent-clipped=0.0 2024-08-12 08:29:52,214 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 08:29:52,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1548160.0, ans=0.125 2024-08-12 08:29:57,834 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9900, loss[loss=0.0919, beats_loss=0.01235, ecapa_loss=0.0001678, whisper_loss=0.07787, over 22768.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01107, ecapa_loss=0.0001805, whisper_loss=0.09178, over 3852304.31 frames. ], batch size: 94, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:30:07,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1548260.0, ans=0.125 2024-08-12 08:30:08,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1548260.0, ans=0.05 2024-08-12 08:30:12,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1548360.0, ans=0.125 2024-08-12 08:30:14,650 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.60 vs. limit=12.0 2024-08-12 08:30:17,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1548360.0, ans=0.05 2024-08-12 08:30:24,517 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 08:30:25,225 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2024-08-12 08:30:26,090 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 27 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 08:30:41,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1548560.0, ans=0.025 2024-08-12 08:30:47,503 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 32 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 08:30:58,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1548660.0, ans=0.1 2024-08-12 08:31:05,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1548660.0, ans=0.1 2024-08-12 08:31:08,304 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-12 08:31:08,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1548660.0, ans=0.125 2024-08-12 08:31:08,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1548660.0, ans=0.1 2024-08-12 08:31:09,036 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2024-08-12 08:31:10,871 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 9950, loss[loss=0.1097, beats_loss=0.009505, ecapa_loss=0.0002008, whisper_loss=0.09822, over 21525.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0111, ecapa_loss=0.0001801, whisper_loss=0.09227, over 3879391.20 frames. ], batch size: 87, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:31:13,047 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 08:31:17,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1548760.0, ans=0.125 2024-08-12 08:31:22,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1548760.0, ans=0.0 2024-08-12 08:31:34,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1548860.0, ans=0.1 2024-08-12 08:31:37,397 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2024-08-12 08:31:47,193 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 08:31:56,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1549060.0, ans=0.125 2024-08-12 08:32:06,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1549060.0, ans=0.125 2024-08-12 08:32:10,471 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 08:32:14,670 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.498e+01 2.780e+01 3.249e+01 5.152e+01, threshold=5.559e+01, percent-clipped=0.0 2024-08-12 08:32:15,294 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 08:32:24,576 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10000, loss[loss=0.109, beats_loss=0.01038, ecapa_loss=0.0001853, whisper_loss=0.09672, over 22341.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.011, ecapa_loss=0.0001813, whisper_loss=0.09297, over 3898682.32 frames. ], batch size: 92, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:32:58,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1549460.0, ans=0.1 2024-08-12 08:33:19,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.61 vs. limit=15.0 2024-08-12 08:33:25,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1549660.0, ans=0.0 2024-08-12 08:33:38,552 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10050, loss[loss=0.1007, beats_loss=0.01044, ecapa_loss=0.0001388, whisper_loss=0.08886, over 19133.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01095, ecapa_loss=0.0001802, whisper_loss=0.09316, over 3894456.15 frames. ], batch size: 71, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:33:43,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1549760.0, ans=0.125 2024-08-12 08:33:52,882 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 24 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-12 08:34:01,851 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 08:34:05,404 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.44 vs. limit=22.5 2024-08-12 08:34:09,415 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 08:34:09,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1549960.0, ans=0.025 2024-08-12 08:34:25,495 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 08:34:29,321 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-12 08:34:32,453 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 31 from Vox, 24 fro AS 2024-08-12 08:34:35,219 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-12 08:34:40,790 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.497e+01 2.870e+01 3.338e+01 7.482e+01, threshold=5.741e+01, percent-clipped=1.0 2024-08-12 08:34:44,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1550160.0, ans=0.0 2024-08-12 08:34:44,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1550160.0, ans=0.125 2024-08-12 08:34:49,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1550160.0, ans=0.2 2024-08-12 08:34:51,423 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10100, loss[loss=0.09925, beats_loss=0.009327, ecapa_loss=0.0002128, whisper_loss=0.08779, over 22940.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01097, ecapa_loss=0.0001809, whisper_loss=0.09329, over 3933299.11 frames. ], batch size: 92, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:35:04,045 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 08:35:10,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1550360.0, ans=0.0 2024-08-12 08:35:13,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1550360.0, ans=0.125 2024-08-12 08:35:14,352 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 15 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-12 08:35:15,726 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 08:35:16,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1550360.0, ans=0.07 2024-08-12 08:35:24,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1550460.0, ans=0.2 2024-08-12 08:35:31,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1550460.0, ans=0.0 2024-08-12 08:35:31,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1550460.0, ans=0.2 2024-08-12 08:35:36,328 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=15.0 2024-08-12 08:35:46,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1550560.0, ans=0.125 2024-08-12 08:35:51,589 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-12 08:36:02,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1550660.0, ans=0.125 2024-08-12 08:36:04,989 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10150, loss[loss=0.1289, beats_loss=0.009567, ecapa_loss=0.0001183, whisper_loss=0.1182, over 20958.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01091, ecapa_loss=0.0001822, whisper_loss=0.09317, over 3921627.63 frames. ], batch size: 75, lr: 5.68e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:36:05,698 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.326e+00 2024-08-12 08:36:10,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1550760.0, ans=0.0 2024-08-12 08:36:23,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1550860.0, ans=0.125 2024-08-12 08:36:27,810 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 08:36:40,853 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 08:36:44,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1550960.0, ans=0.125 2024-08-12 08:36:55,088 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 08:36:57,100 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-12 08:37:10,147 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.557e+01 2.799e+01 3.287e+01 1.688e+02, threshold=5.598e+01, percent-clipped=1.0 2024-08-12 08:37:21,584 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10200, loss[loss=0.1201, beats_loss=0.009423, ecapa_loss=0.0001683, whisper_loss=0.109, over 20501.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01099, ecapa_loss=0.0001817, whisper_loss=0.09249, over 3910610.63 frames. ], batch size: 79, lr: 5.68e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:37:24,733 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 31 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-12 08:37:28,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1551260.0, ans=0.035 2024-08-12 08:37:30,461 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.64 vs. limit=15.0 2024-08-12 08:37:31,476 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2024-08-12 08:37:39,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1551360.0, ans=0.125 2024-08-12 08:37:41,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1551360.0, ans=0.5 2024-08-12 08:37:42,572 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-08-12 08:37:44,146 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.78 vs. limit=15.0 2024-08-12 08:37:49,894 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-12 08:38:09,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1551560.0, ans=0.0 2024-08-12 08:38:19,515 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.053e+00 2024-08-12 08:38:32,800 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 08:38:38,563 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10250, loss[loss=0.07513, beats_loss=0.01385, ecapa_loss=0.0001694, whisper_loss=0.05958, over 14858.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01104, ecapa_loss=0.0001804, whisper_loss=0.0925, over 3932694.23 frames. ], batch size: 62, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:38:54,207 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 10 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 08:38:55,385 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=15.0 2024-08-12 08:39:10,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1551960.0, ans=0.125 2024-08-12 08:39:11,537 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-12 08:39:19,210 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 08:39:22,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1551960.0, ans=0.125 2024-08-12 08:39:31,013 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-12 08:39:34,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1552060.0, ans=0.05 2024-08-12 08:39:46,434 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.422e+01 2.707e+01 3.104e+01 5.382e+01, threshold=5.414e+01, percent-clipped=0.0 2024-08-12 08:39:57,307 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10300, loss[loss=0.1112, beats_loss=0.01096, ecapa_loss=0.0001395, whisper_loss=0.09883, over 15971.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01103, ecapa_loss=0.0001804, whisper_loss=0.09193, over 3906549.66 frames. ], batch size: 62, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:40:26,074 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-12 08:40:37,370 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.64 vs. limit=6.0 2024-08-12 08:40:49,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1552560.0, ans=0.125 2024-08-12 08:40:54,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1552560.0, ans=0.1 2024-08-12 08:41:03,490 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-12 08:41:05,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1552660.0, ans=0.2 2024-08-12 08:41:13,794 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10350, loss[loss=0.1047, beats_loss=0.01135, ecapa_loss=0.0001662, whisper_loss=0.09164, over 20882.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01098, ecapa_loss=0.0001804, whisper_loss=0.09275, over 3925447.27 frames. ], batch size: 82, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:41:18,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1552760.0, ans=0.2 2024-08-12 08:41:33,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1552860.0, ans=0.2 2024-08-12 08:41:43,777 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 08:42:06,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1553060.0, ans=0.2 2024-08-12 08:42:17,140 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.587e+01 2.793e+01 3.199e+01 6.798e+01, threshold=5.587e+01, percent-clipped=1.0 2024-08-12 08:42:27,521 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10400, loss[loss=0.08708, beats_loss=0.0141, ecapa_loss=0.0001507, whisper_loss=0.07148, over 14653.00 frames. ], tot_loss[loss=0.105, beats_loss=0.011, ecapa_loss=0.0001809, whisper_loss=0.09214, over 3915541.56 frames. ], batch size: 58, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:42:33,851 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 08:42:34,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1553260.0, ans=0.1 2024-08-12 08:42:37,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1553260.0, ans=0.125 2024-08-12 08:42:55,158 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 08:42:55,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1553360.0, ans=0.125 2024-08-12 08:42:58,358 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 21 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-12 08:43:01,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1553460.0, ans=0.0 2024-08-12 08:43:02,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1553460.0, ans=0.125 2024-08-12 08:43:08,396 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 08:43:13,821 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2024-08-12 08:43:19,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1553560.0, ans=0.0 2024-08-12 08:43:32,194 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 08:43:35,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1553660.0, ans=0.0 2024-08-12 08:43:43,260 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10450, loss[loss=0.1118, beats_loss=0.009035, ecapa_loss=0.0001532, whisper_loss=0.1013, over 18973.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01095, ecapa_loss=0.00018, whisper_loss=0.09178, over 3880420.94 frames. ], batch size: 71, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:43:54,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1553760.0, ans=0.125 2024-08-12 08:44:16,181 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.33 vs. limit=10.0 2024-08-12 08:44:17,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1553960.0, ans=0.0 2024-08-12 08:44:20,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1553960.0, ans=0.0 2024-08-12 08:44:33,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1554060.0, ans=0.125 2024-08-12 08:44:48,710 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.496e+01 2.841e+01 3.416e+01 4.859e+01, threshold=5.681e+01, percent-clipped=0.0 2024-08-12 08:44:49,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1554160.0, ans=0.0 2024-08-12 08:44:51,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1554160.0, ans=0.125 2024-08-12 08:44:57,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1554160.0, ans=0.0 2024-08-12 08:44:59,541 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10500, loss[loss=0.09506, beats_loss=0.01206, ecapa_loss=0.0001873, whisper_loss=0.08112, over 20032.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01105, ecapa_loss=0.0001807, whisper_loss=0.09142, over 3896357.16 frames. ], batch size: 84, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:45:14,407 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-12 08:45:21,980 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 08:45:31,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1554460.0, ans=0.0 2024-08-12 08:45:42,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1554560.0, ans=0.125 2024-08-12 08:45:46,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1554560.0, ans=0.5 2024-08-12 08:45:53,406 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 08:46:11,930 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 08:46:12,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1554760.0, ans=0.2 2024-08-12 08:46:12,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1554760.0, ans=0.0 2024-08-12 08:46:12,906 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10550, loss[loss=0.09776, beats_loss=0.01022, ecapa_loss=0.0001801, whisper_loss=0.08574, over 21257.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01109, ecapa_loss=0.0001815, whisper_loss=0.0907, over 3889864.73 frames. ], batch size: 89, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:46:16,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1554760.0, ans=0.2 2024-08-12 08:46:28,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1554860.0, ans=0.0 2024-08-12 08:46:44,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1554960.0, ans=10.0 2024-08-12 08:46:48,642 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 29 from LS+wenet, 9 from Vox, 32 fro AS 2024-08-12 08:46:54,612 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 08:47:02,230 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-12 08:47:05,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1555060.0, ans=0.125 2024-08-12 08:47:18,764 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.542e+01 2.754e+01 3.046e+01 4.371e+01, threshold=5.507e+01, percent-clipped=0.0 2024-08-12 08:47:23,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1555160.0, ans=0.0 2024-08-12 08:47:24,794 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 08:47:25,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1555160.0, ans=0.1 2024-08-12 08:47:29,392 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10600, loss[loss=0.1084, beats_loss=0.01018, ecapa_loss=0.0001595, whisper_loss=0.09665, over 23205.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01114, ecapa_loss=0.0001815, whisper_loss=0.09096, over 3906258.32 frames. ], batch size: 93, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:47:56,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1555360.0, ans=0.0 2024-08-12 08:47:57,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1555360.0, ans=0.125 2024-08-12 08:48:17,472 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 08:48:30,242 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 08:48:38,957 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 08:48:43,073 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10650, loss[loss=0.08674, beats_loss=0.01305, ecapa_loss=0.0001987, whisper_loss=0.0717, over 16420.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01113, ecapa_loss=0.0001801, whisper_loss=0.09134, over 3897085.56 frames. ], batch size: 69, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:48:54,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1555760.0, ans=0.0 2024-08-12 08:48:54,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1555760.0, ans=0.125 2024-08-12 08:49:16,337 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 08:49:47,630 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.652e+01 2.957e+01 3.394e+01 5.576e+01, threshold=5.914e+01, percent-clipped=1.0 2024-08-12 08:49:56,931 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 08:49:58,768 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10700, loss[loss=0.09811, beats_loss=0.01209, ecapa_loss=0.0001617, whisper_loss=0.0844, over 15310.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01109, ecapa_loss=0.0001786, whisper_loss=0.09225, over 3903789.41 frames. ], batch size: 61, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:50:15,462 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-12 08:51:04,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1556660.0, ans=0.125 2024-08-12 08:51:10,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1556660.0, ans=0.0 2024-08-12 08:51:12,903 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10750, loss[loss=0.1041, beats_loss=0.01095, ecapa_loss=0.0001872, whisper_loss=0.09124, over 19835.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0111, ecapa_loss=0.000179, whisper_loss=0.09267, over 3885936.64 frames. ], batch size: 82, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:51:19,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1556760.0, ans=0.125 2024-08-12 08:51:25,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.83 vs. limit=10.0 2024-08-12 08:51:38,920 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 08:51:41,740 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 08:51:44,605 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 08:51:46,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1556960.0, ans=0.125 2024-08-12 08:51:56,134 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 08:52:02,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1557060.0, ans=0.125 2024-08-12 08:52:17,029 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.513e+01 2.826e+01 3.158e+01 5.993e+01, threshold=5.652e+01, percent-clipped=1.0 2024-08-12 08:52:19,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1557160.0, ans=0.125 2024-08-12 08:52:28,017 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10800, loss[loss=0.1156, beats_loss=0.01076, ecapa_loss=0.0001974, whisper_loss=0.1029, over 20422.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01105, ecapa_loss=0.0001794, whisper_loss=0.09284, over 3896705.99 frames. ], batch size: 79, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:52:41,745 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 08:52:41,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1557360.0, ans=0.125 2024-08-12 08:53:19,625 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 08:53:38,651 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 08:53:42,623 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10850, loss[loss=0.1194, beats_loss=0.009325, ecapa_loss=0.000186, whisper_loss=0.1082, over 19849.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01108, ecapa_loss=0.0001785, whisper_loss=0.09293, over 3906294.65 frames. ], batch size: 75, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:53:48,804 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 08:54:21,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=1557960.0, ans=0.025 2024-08-12 08:54:27,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1558060.0, ans=0.125 2024-08-12 08:54:47,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1558160.0, ans=0.125 2024-08-12 08:54:47,771 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.594e+01 2.957e+01 3.345e+01 7.139e+01, threshold=5.915e+01, percent-clipped=2.0 2024-08-12 08:54:51,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1558160.0, ans=0.0 2024-08-12 08:54:54,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1558160.0, ans=0.0 2024-08-12 08:54:59,341 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10900, loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001741, whisper_loss=0.09021, over 19702.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01107, ecapa_loss=0.000179, whisper_loss=0.09308, over 3915270.92 frames. ], batch size: 76, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:55:10,833 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=15.0 2024-08-12 08:55:37,906 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 08:55:48,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1558560.0, ans=0.125 2024-08-12 08:56:01,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1558560.0, ans=0.1 2024-08-12 08:56:01,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1558560.0, ans=0.04949747468305833 2024-08-12 08:56:18,963 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 10950, loss[loss=0.1084, beats_loss=0.01227, ecapa_loss=0.0001222, whisper_loss=0.09488, over 15799.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01105, ecapa_loss=0.0001783, whisper_loss=0.09298, over 3885635.43 frames. ], batch size: 59, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:56:26,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1558760.0, ans=0.125 2024-08-12 08:56:31,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1558760.0, ans=0.0 2024-08-12 08:56:41,926 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 08:56:48,151 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.83 vs. limit=15.0 2024-08-12 08:56:50,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1558860.0, ans=0.0 2024-08-12 08:56:57,233 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 10 from Vox, 39 fro AS 2024-08-12 08:57:10,184 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 08:57:14,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1559060.0, ans=0.2 2024-08-12 08:57:16,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1559060.0, ans=0.1 2024-08-12 08:57:17,972 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 08:57:36,007 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-12 08:57:36,528 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.556e+01 2.763e+01 3.156e+01 4.815e+01, threshold=5.526e+01, percent-clipped=0.0 2024-08-12 08:57:44,570 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 08:57:46,648 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-12 08:57:50,291 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11000, loss[loss=0.08511, beats_loss=0.01365, ecapa_loss=0.0001604, whisper_loss=0.06985, over 16272.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01106, ecapa_loss=0.0001778, whisper_loss=0.09263, over 3909323.41 frames. ], batch size: 64, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:57:54,441 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2024-08-12 08:57:55,672 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 23 from LS+wenet, 21 from Vox, 9 fro AS 2024-08-12 08:58:10,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1559360.0, ans=0.0 2024-08-12 08:58:14,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2024-08-12 08:58:16,814 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.97 vs. limit=12.0 2024-08-12 08:58:26,025 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 08:58:37,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1559460.0, ans=0.2 2024-08-12 08:59:01,399 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 08:59:13,498 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11050, loss[loss=0.1013, beats_loss=0.01452, ecapa_loss=0.0001215, whisper_loss=0.08553, over 22076.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01109, ecapa_loss=0.0001774, whisper_loss=0.09284, over 3927792.59 frames. ], batch size: 85, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:59:19,222 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 08:59:26,851 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 08:59:28,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1559760.0, ans=0.125 2024-08-12 08:59:39,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1559860.0, ans=0.125 2024-08-12 09:00:01,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1559960.0, ans=0.1 2024-08-12 09:00:20,422 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=15.0 2024-08-12 09:00:21,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1560060.0, ans=0.125 2024-08-12 09:00:26,191 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 09:00:34,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1560060.0, ans=0.125 2024-08-12 09:00:45,331 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 16 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 09:00:49,228 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.392e+01 2.745e+01 3.211e+01 4.714e+01, threshold=5.490e+01, percent-clipped=0.0 2024-08-12 09:01:04,740 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11100, loss[loss=0.09015, beats_loss=0.0142, ecapa_loss=0.0001243, whisper_loss=0.07471, over 17031.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01103, ecapa_loss=0.0001784, whisper_loss=0.09286, over 3893779.11 frames. ], batch size: 66, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:01:35,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1560360.0, ans=0.0 2024-08-12 09:01:56,004 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 18 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-12 09:02:10,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1560460.0, ans=0.125 2024-08-12 09:02:35,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1560560.0, ans=0.0 2024-08-12 09:02:47,810 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 38 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 09:02:59,251 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11150, loss[loss=0.1089, beats_loss=0.01038, ecapa_loss=0.0001431, whisper_loss=0.09706, over 21977.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.011, ecapa_loss=0.0001771, whisper_loss=0.09282, over 3893907.39 frames. ], batch size: 83, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:03:16,378 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 09:03:18,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1560760.0, ans=0.125 2024-08-12 09:03:20,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1560860.0, ans=0.125 2024-08-12 09:04:17,978 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 09:04:21,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1561060.0, ans=0.125 2024-08-12 09:04:29,414 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.591e+01 2.914e+01 3.431e+01 1.120e+02, threshold=5.828e+01, percent-clipped=1.0 2024-08-12 09:04:39,843 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=22.5 2024-08-12 09:04:40,136 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11200, loss[loss=0.1149, beats_loss=0.01154, ecapa_loss=0.0001759, whisper_loss=0.1016, over 20207.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01103, ecapa_loss=0.0001778, whisper_loss=0.09214, over 3899649.63 frames. ], batch size: 80, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:05:24,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1561560.0, ans=0.0 2024-08-12 09:05:50,760 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 09:05:52,227 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 09:05:55,281 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11250, loss[loss=0.0952, beats_loss=0.01139, ecapa_loss=0.000173, whisper_loss=0.08208, over 19664.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01106, ecapa_loss=0.0001772, whisper_loss=0.09266, over 3914677.54 frames. ], batch size: 77, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:06:09,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1561860.0, ans=0.125 2024-08-12 09:06:23,089 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 09:06:48,728 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 20 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-12 09:06:50,008 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 09:06:58,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1562160.0, ans=0.1 2024-08-12 09:07:00,573 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.463e+01 2.812e+01 3.090e+01 4.861e+01, threshold=5.624e+01, percent-clipped=0.0 2024-08-12 09:07:03,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1562160.0, ans=0.125 2024-08-12 09:07:08,933 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-12 09:07:12,178 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11300, loss[loss=0.1071, beats_loss=0.01031, ecapa_loss=0.0001778, whisper_loss=0.09497, over 21978.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01109, ecapa_loss=0.0001771, whisper_loss=0.09245, over 3883838.60 frames. ], batch size: 89, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:07:18,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1562260.0, ans=0.0 2024-08-12 09:07:20,584 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.61 vs. limit=15.0 2024-08-12 09:07:21,984 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.33 vs. limit=10.0 2024-08-12 09:07:26,150 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2024-08-12 09:08:16,937 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 09:08:27,178 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11350, loss[loss=0.1184, beats_loss=0.009313, ecapa_loss=0.0002128, whisper_loss=0.107, over 19310.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01105, ecapa_loss=0.0001771, whisper_loss=0.09279, over 3901074.51 frames. ], batch size: 79, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:08:40,874 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 09:08:48,782 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2024-08-12 09:08:49,500 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 09:08:58,250 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-12 09:09:02,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1562960.0, ans=0.1 2024-08-12 09:09:09,357 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.87 vs. limit=22.5 2024-08-12 09:09:10,263 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-12 09:09:13,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1563060.0, ans=0.2 2024-08-12 09:09:14,732 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 9 from Vox, 32 fro AS 2024-08-12 09:09:18,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1563060.0, ans=0.125 2024-08-12 09:09:19,443 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.96 vs. limit=15.0 2024-08-12 09:09:25,065 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 09:09:31,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1563160.0, ans=0.125 2024-08-12 09:09:32,432 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.058e+01 2.664e+01 2.943e+01 3.527e+01 6.465e+01, threshold=5.886e+01, percent-clipped=3.0 2024-08-12 09:09:35,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1563160.0, ans=0.1 2024-08-12 09:09:43,156 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11400, loss[loss=0.09198, beats_loss=0.01286, ecapa_loss=0.0001895, whisper_loss=0.07723, over 17087.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0111, ecapa_loss=0.0001775, whisper_loss=0.09283, over 3887964.76 frames. ], batch size: 73, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:09:49,677 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-12 09:10:01,344 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 31 from Vox, 20 fro AS 2024-08-12 09:10:04,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1563360.0, ans=0.07 2024-08-12 09:10:15,537 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 21 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-12 09:10:30,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1563560.0, ans=0.1 2024-08-12 09:10:46,266 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.51 vs. limit=15.0 2024-08-12 09:10:51,230 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-12 09:10:52,542 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 11 from Vox, 42 fro AS 2024-08-12 09:10:58,696 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11450, loss[loss=0.1037, beats_loss=0.01314, ecapa_loss=0.000202, whisper_loss=0.08853, over 18616.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01115, ecapa_loss=0.0001779, whisper_loss=0.09262, over 3894361.17 frames. ], batch size: 77, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:11:13,912 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.68 vs. limit=10.0 2024-08-12 09:11:20,030 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.67 vs. limit=22.5 2024-08-12 09:11:23,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1563860.0, ans=0.125 2024-08-12 09:11:31,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1563960.0, ans=0.125 2024-08-12 09:11:38,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1563960.0, ans=0.0 2024-08-12 09:11:50,402 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 09:12:01,918 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.222e+01 2.691e+01 2.984e+01 3.648e+01 5.377e+01, threshold=5.967e+01, percent-clipped=0.0 2024-08-12 09:12:12,742 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11500, loss[loss=0.1017, beats_loss=0.01135, ecapa_loss=0.0001467, whisper_loss=0.08887, over 18052.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01107, ecapa_loss=0.000178, whisper_loss=0.0931, over 3887403.07 frames. ], batch size: 71, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:12:28,475 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.03 vs. limit=15.0 2024-08-12 09:12:43,234 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.44 vs. limit=22.5 2024-08-12 09:12:43,306 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.49 vs. limit=15.0 2024-08-12 09:12:50,152 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 09:13:01,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1564560.0, ans=0.125 2024-08-12 09:13:07,029 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2024-08-12 09:13:26,649 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11550, loss[loss=0.1146, beats_loss=0.006934, ecapa_loss=0.0002188, whisper_loss=0.1054, over 19372.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01105, ecapa_loss=0.0001789, whisper_loss=0.0935, over 3900756.38 frames. ], batch size: 76, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:13:26,948 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 26 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-12 09:13:38,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1564760.0, ans=0.125 2024-08-12 09:13:57,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1564960.0, ans=0.1 2024-08-12 09:14:12,158 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.46 vs. limit=10.0 2024-08-12 09:14:21,885 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 09:14:32,112 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.539e+01 2.783e+01 3.251e+01 6.274e+01, threshold=5.566e+01, percent-clipped=2.0 2024-08-12 09:14:36,873 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 09:14:38,207 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 09:14:41,861 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11600, loss[loss=0.1065, beats_loss=0.01082, ecapa_loss=0.0001554, whisper_loss=0.09415, over 20254.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01102, ecapa_loss=0.0001793, whisper_loss=0.09311, over 3900823.70 frames. ], batch size: 76, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:14:51,031 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 09:14:51,629 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.63 vs. limit=10.0 2024-08-12 09:14:59,677 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 09:15:13,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1565460.0, ans=0.0 2024-08-12 09:15:17,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1565460.0, ans=0.0 2024-08-12 09:15:20,266 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.042e-01 2024-08-12 09:15:22,432 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 09:15:25,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1565560.0, ans=0.0 2024-08-12 09:15:37,900 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 09:15:43,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1565660.0, ans=0.1 2024-08-12 09:15:52,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1565760.0, ans=0.0 2024-08-12 09:15:52,887 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11650, loss[loss=0.1092, beats_loss=0.01028, ecapa_loss=0.000229, whisper_loss=0.09665, over 21724.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01099, ecapa_loss=0.0001796, whisper_loss=0.09329, over 3907271.22 frames. ], batch size: 89, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:15:53,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1565760.0, ans=0.0 2024-08-12 09:16:03,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1565760.0, ans=0.125 2024-08-12 09:16:11,592 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2024-08-12 09:16:14,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1565860.0, ans=0.0 2024-08-12 09:16:15,664 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 09:16:30,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1565960.0, ans=0.0 2024-08-12 09:16:35,416 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 33 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 09:16:36,807 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-12 09:16:49,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1566160.0, ans=0.0 2024-08-12 09:16:53,054 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.448e+01 2.832e+01 3.122e+01 7.544e+01, threshold=5.665e+01, percent-clipped=2.0 2024-08-12 09:16:54,935 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 09:17:03,087 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11700, loss[loss=0.09516, beats_loss=0.01154, ecapa_loss=0.0001671, whisper_loss=0.08195, over 13509.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01108, ecapa_loss=0.0001799, whisper_loss=0.09274, over 3923097.64 frames. ], batch size: 54, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:17:04,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1566260.0, ans=0.0 2024-08-12 09:17:06,334 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 09:17:18,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1566360.0, ans=0.0 2024-08-12 09:17:22,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1566360.0, ans=0.125 2024-08-12 09:17:31,205 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.33 vs. limit=22.5 2024-08-12 09:17:37,524 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 09:17:50,976 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 09:17:57,011 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 09:18:09,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1566660.0, ans=0.0 2024-08-12 09:18:11,856 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11750, loss[loss=0.1155, beats_loss=0.01161, ecapa_loss=0.0001973, whisper_loss=0.1019, over 20781.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01116, ecapa_loss=0.0001785, whisper_loss=0.09284, over 3936923.35 frames. ], batch size: 87, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:18:19,000 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.78 vs. limit=22.5 2024-08-12 09:18:42,556 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=12.0 2024-08-12 09:18:43,586 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 09:18:45,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=1566960.0, ans=10.0 2024-08-12 09:19:11,759 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 22 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-12 09:19:12,868 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.551e+01 2.829e+01 3.227e+01 5.711e+01, threshold=5.658e+01, percent-clipped=1.0 2024-08-12 09:19:15,154 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2024-08-12 09:19:20,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1567160.0, ans=0.2 2024-08-12 09:19:22,603 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11800, loss[loss=0.1208, beats_loss=0.01078, ecapa_loss=0.0001988, whisper_loss=0.108, over 23100.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01119, ecapa_loss=0.0001774, whisper_loss=0.09261, over 3935966.21 frames. ], batch size: 93, lr: 5.65e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:19:44,023 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.768e-01 2024-08-12 09:19:52,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1567460.0, ans=0.1 2024-08-12 09:19:58,725 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 09:20:02,936 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 09:20:19,451 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 16 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 09:20:31,851 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11850, loss[loss=0.1037, beats_loss=0.01312, ecapa_loss=0.0002083, whisper_loss=0.08853, over 21607.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01119, ecapa_loss=0.000177, whisper_loss=0.09239, over 3938082.74 frames. ], batch size: 90, lr: 5.65e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:20:39,947 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-12 09:20:43,530 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=22.5 2024-08-12 09:20:47,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1567860.0, ans=0.125 2024-08-12 09:20:48,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1567860.0, ans=0.0 2024-08-12 09:21:03,911 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.80 vs. limit=22.5 2024-08-12 09:21:20,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1568060.0, ans=0.125 2024-08-12 09:21:21,158 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 09:21:31,405 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.485e+01 2.770e+01 3.068e+01 4.213e+01, threshold=5.539e+01, percent-clipped=0.0 2024-08-12 09:21:37,229 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-12 09:21:39,559 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11900, loss[loss=0.1146, beats_loss=0.01147, ecapa_loss=0.0001746, whisper_loss=0.1014, over 22273.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01112, ecapa_loss=0.0001775, whisper_loss=0.09321, over 3938971.26 frames. ], batch size: 90, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:21:42,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1568260.0, ans=0.0 2024-08-12 09:21:48,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1568260.0, ans=0.1 2024-08-12 09:21:55,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1568360.0, ans=0.125 2024-08-12 09:21:56,822 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 09:22:04,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1568360.0, ans=0.125 2024-08-12 09:22:49,900 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 11950, loss[loss=0.106, beats_loss=0.01264, ecapa_loss=0.0001902, whisper_loss=0.09144, over 22213.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01101, ecapa_loss=0.0001789, whisper_loss=0.09331, over 3927665.77 frames. ], batch size: 89, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:22:57,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1568760.0, ans=0.0 2024-08-12 09:23:08,359 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 09:23:32,528 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2024-08-12 09:23:33,449 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 09:23:51,638 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.558e+01 2.859e+01 3.291e+01 5.466e+01, threshold=5.718e+01, percent-clipped=0.0 2024-08-12 09:23:51,943 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-12 09:24:00,142 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12000, loss[loss=0.0989, beats_loss=0.01235, ecapa_loss=0.0001929, whisper_loss=0.08462, over 23073.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01099, ecapa_loss=0.0001792, whisper_loss=0.09319, over 3910215.39 frames. ], batch size: 94, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:24:00,143 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 09:24:39,946 INFO [train_multi_KD3.py:1149] (2/4) Epoch 11, validation on ASR_libri: loss=0.2552, beats_loss=0, ecapa_loss=0.0006057, whisper_loss=0.2491, over 922467.00 frames. 2024-08-12 09:24:56,762 INFO [train_multi_KD3.py:1149] (2/4) Epoch 11, validation on SV_voxceleb1: loss=0.004842, beats_loss=0, ecapa_loss=0.0004842, whisper_loss=0, over 939242.00 frames. 2024-08-12 09:26:51,088 INFO [train_multi_KD3.py:1149] (2/4) Epoch 11, validation on AT_audioset: loss=0.02454, beats_loss=0.02454, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 09:26:51,093 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 09:27:38,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1569560.0, ans=0.09899494936611666 2024-08-12 09:27:50,985 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-12 09:28:01,731 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12050, loss[loss=0.1006, beats_loss=0.0129, ecapa_loss=0.0001887, whisper_loss=0.08586, over 21114.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01094, ecapa_loss=0.0001805, whisper_loss=0.09314, over 3875463.73 frames. ], batch size: 84, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:28:22,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1569860.0, ans=6.0 2024-08-12 09:28:23,461 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.80 vs. limit=15.0 2024-08-12 09:28:24,231 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-12 09:28:48,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1570060.0, ans=0.0 2024-08-12 09:28:58,868 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2024-08-12 09:29:00,294 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=15.0 2024-08-12 09:29:03,735 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.535e+01 2.943e+01 3.446e+01 4.689e+01, threshold=5.887e+01, percent-clipped=0.0 2024-08-12 09:29:07,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1570160.0, ans=0.125 2024-08-12 09:29:11,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1570260.0, ans=0.125 2024-08-12 09:29:12,099 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12100, loss[loss=0.08944, beats_loss=0.01405, ecapa_loss=0.0001527, whisper_loss=0.07387, over 22996.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01094, ecapa_loss=0.0001808, whisper_loss=0.0931, over 3891514.57 frames. ], batch size: 93, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:29:12,278 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 09:29:17,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1570260.0, ans=0.0 2024-08-12 09:29:37,427 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 09:29:38,679 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 09:29:57,826 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-12 09:30:16,624 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2024-08-12 09:30:22,761 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12150, loss[loss=0.1008, beats_loss=0.01182, ecapa_loss=0.0001847, whisper_loss=0.08712, over 22497.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01106, ecapa_loss=0.0001793, whisper_loss=0.09256, over 3885495.24 frames. ], batch size: 93, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:30:30,421 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 09:30:42,844 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 37 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-12 09:30:57,854 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 09:31:23,506 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-12 09:31:25,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1571160.0, ans=0.125 2024-08-12 09:31:25,767 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.520e+01 2.822e+01 3.048e+01 5.048e+01, threshold=5.643e+01, percent-clipped=0.0 2024-08-12 09:31:34,916 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12200, loss[loss=0.06944, beats_loss=0.01179, ecapa_loss=0.0001637, whisper_loss=0.05601, over 15085.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.011, ecapa_loss=0.0001798, whisper_loss=0.09205, over 3888738.15 frames. ], batch size: 62, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:31:51,378 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 09:32:01,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1571360.0, ans=0.125 2024-08-12 09:32:05,022 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 09:32:10,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1571460.0, ans=0.025 2024-08-12 09:32:47,412 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12250, loss[loss=0.101, beats_loss=0.01167, ecapa_loss=0.0001735, whisper_loss=0.0876, over 19628.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01088, ecapa_loss=0.00018, whisper_loss=0.09256, over 3871454.66 frames. ], batch size: 80, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:32:51,956 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 09:32:52,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1571760.0, ans=0.0 2024-08-12 09:33:05,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1571860.0, ans=0.125 2024-08-12 09:33:12,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1571860.0, ans=0.09899494936611666 2024-08-12 09:33:14,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1571860.0, ans=0.1 2024-08-12 09:33:14,362 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=9.928e+00 2024-08-12 09:33:18,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1571960.0, ans=0.125 2024-08-12 09:33:21,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1571960.0, ans=0.2 2024-08-12 09:33:25,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1571960.0, ans=0.125 2024-08-12 09:33:36,532 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 09:33:40,887 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 09:33:44,626 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.58 vs. limit=10.0 2024-08-12 09:33:47,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1572160.0, ans=0.125 2024-08-12 09:33:51,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1572160.0, ans=0.0 2024-08-12 09:33:51,847 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.601e+01 2.927e+01 3.328e+01 4.694e+01, threshold=5.855e+01, percent-clipped=0.0 2024-08-12 09:33:53,443 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 09:34:00,251 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12300, loss[loss=0.08206, beats_loss=0.01268, ecapa_loss=0.0002033, whisper_loss=0.06735, over 20549.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01099, ecapa_loss=0.00018, whisper_loss=0.09172, over 3878316.36 frames. ], batch size: 91, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:34:23,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1572360.0, ans=0.1 2024-08-12 09:34:42,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1572560.0, ans=0.2 2024-08-12 09:34:47,074 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2024-08-12 09:34:49,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1572560.0, ans=0.125 2024-08-12 09:35:12,231 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12350, loss[loss=0.1229, beats_loss=0.007639, ecapa_loss=0.0001638, whisper_loss=0.1137, over 18353.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01099, ecapa_loss=0.0001808, whisper_loss=0.09224, over 3905592.66 frames. ], batch size: 69, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:35:13,328 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 09:35:17,077 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 09:35:19,720 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 09:35:26,603 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 34 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 09:35:32,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1572860.0, ans=0.0 2024-08-12 09:35:36,481 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 09:36:16,692 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.617e+01 3.064e+01 3.584e+01 5.581e+01, threshold=6.128e+01, percent-clipped=0.0 2024-08-12 09:36:20,360 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2024-08-12 09:36:23,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1573160.0, ans=0.0 2024-08-12 09:36:25,411 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12400, loss[loss=0.09743, beats_loss=0.01218, ecapa_loss=0.0001757, whisper_loss=0.08349, over 21791.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01096, ecapa_loss=0.0001789, whisper_loss=0.09245, over 3886269.92 frames. ], batch size: 88, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:36:32,917 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-12 09:36:37,550 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 09:36:47,217 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.468e-01 2024-08-12 09:36:48,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1573360.0, ans=0.0 2024-08-12 09:37:12,819 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 09:37:29,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1573660.0, ans=0.1 2024-08-12 09:37:38,187 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12450, loss[loss=0.1068, beats_loss=0.009678, ecapa_loss=0.0002109, whisper_loss=0.09502, over 15693.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01092, ecapa_loss=0.0001795, whisper_loss=0.0922, over 3875427.94 frames. ], batch size: 63, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:37:39,951 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 34 from Vox, 27 fro AS 2024-08-12 09:37:43,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1573760.0, ans=0.025 2024-08-12 09:37:47,155 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 09:37:57,009 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 09:38:22,652 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 09:38:22,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1574060.0, ans=0.125 2024-08-12 09:38:26,857 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 09:38:30,262 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2024-08-12 09:38:31,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1574060.0, ans=0.0 2024-08-12 09:38:34,128 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 25 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-12 09:38:40,948 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.466e+01 2.753e+01 3.048e+01 4.353e+01, threshold=5.506e+01, percent-clipped=0.0 2024-08-12 09:38:44,890 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2024-08-12 09:38:49,336 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12500, loss[loss=0.115, beats_loss=0.009408, ecapa_loss=0.0001993, whisper_loss=0.1035, over 22950.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0109, ecapa_loss=0.0001794, whisper_loss=0.09256, over 3888936.04 frames. ], batch size: 91, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:39:18,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1574460.0, ans=0.0 2024-08-12 09:39:26,432 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 09:39:47,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1574660.0, ans=0.125 2024-08-12 09:39:58,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1574760.0, ans=0.04949747468305833 2024-08-12 09:39:59,004 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12550, loss[loss=0.1143, beats_loss=0.01077, ecapa_loss=0.0001677, whisper_loss=0.1019, over 23680.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01095, ecapa_loss=0.0001793, whisper_loss=0.09277, over 3902736.48 frames. ], batch size: 94, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:40:07,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1574760.0, ans=0.0 2024-08-12 09:40:37,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1574960.0, ans=0.0 2024-08-12 09:40:56,405 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 12 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 09:41:01,552 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.516e+01 2.754e+01 3.207e+01 3.892e+01, threshold=5.508e+01, percent-clipped=0.0 2024-08-12 09:41:06,426 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 09:41:06,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1575160.0, ans=0.2 2024-08-12 09:41:07,271 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.70 vs. limit=15.0 2024-08-12 09:41:10,453 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12600, loss[loss=0.1177, beats_loss=0.01185, ecapa_loss=0.0001744, whisper_loss=0.1041, over 21882.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01106, ecapa_loss=0.0001792, whisper_loss=0.09195, over 3894530.92 frames. ], batch size: 84, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:41:19,796 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 32 from Vox, 25 fro AS 2024-08-12 09:41:24,020 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-12 09:41:24,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1575360.0, ans=0.0 2024-08-12 09:41:32,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1575360.0, ans=0.1 2024-08-12 09:41:47,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1575460.0, ans=0.07 2024-08-12 09:41:48,215 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 16 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-12 09:41:59,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1575560.0, ans=0.0 2024-08-12 09:42:01,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1575560.0, ans=0.0 2024-08-12 09:42:16,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1575660.0, ans=0.0 2024-08-12 09:42:20,514 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12650, loss[loss=0.1076, beats_loss=0.01171, ecapa_loss=0.0001856, whisper_loss=0.09405, over 19125.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01109, ecapa_loss=0.0001797, whisper_loss=0.09128, over 3871093.74 frames. ], batch size: 77, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:42:49,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1575960.0, ans=0.0 2024-08-12 09:42:52,782 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 09:42:56,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1575960.0, ans=0.125 2024-08-12 09:42:59,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1575960.0, ans=0.1 2024-08-12 09:43:12,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1576060.0, ans=0.125 2024-08-12 09:43:22,171 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.519e+01 2.747e+01 3.019e+01 4.514e+01, threshold=5.494e+01, percent-clipped=0.0 2024-08-12 09:43:28,375 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-12 09:43:30,745 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12700, loss[loss=0.1248, beats_loss=0.008652, ecapa_loss=0.0001786, whisper_loss=0.1143, over 20296.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01111, ecapa_loss=0.0001808, whisper_loss=0.09224, over 3911034.03 frames. ], batch size: 79, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:43:33,667 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 09:43:36,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1576260.0, ans=0.125 2024-08-12 09:43:41,076 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.52 vs. limit=22.5 2024-08-12 09:43:41,143 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.47 vs. limit=12.0 2024-08-12 09:43:44,952 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 15 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 09:43:46,467 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 09:43:53,947 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 09:44:08,059 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 09:44:41,537 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12750, loss[loss=0.09784, beats_loss=0.01012, ecapa_loss=0.0001866, whisper_loss=0.08585, over 17507.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01114, ecapa_loss=0.00018, whisper_loss=0.09253, over 3901170.71 frames. ], batch size: 66, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:45:04,226 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 09:45:12,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1576960.0, ans=0.0 2024-08-12 09:45:32,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1577060.0, ans=0.125 2024-08-12 09:45:43,172 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.126e+01 2.554e+01 2.827e+01 3.190e+01 5.112e+01, threshold=5.654e+01, percent-clipped=0.0 2024-08-12 09:45:51,942 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12800, loss[loss=0.08764, beats_loss=0.01217, ecapa_loss=0.0002183, whisper_loss=0.07329, over 20662.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01116, ecapa_loss=0.0001805, whisper_loss=0.09237, over 3872577.17 frames. ], batch size: 84, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:45:52,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1577260.0, ans=0.125 2024-08-12 09:45:53,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1577260.0, ans=0.1 2024-08-12 09:45:56,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1577260.0, ans=0.2 2024-08-12 09:46:05,648 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 09:46:20,672 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-12 09:46:35,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1577560.0, ans=0.0 2024-08-12 09:46:40,415 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 09:46:43,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1577560.0, ans=0.1 2024-08-12 09:46:44,519 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 29 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-12 09:46:50,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1577660.0, ans=0.0 2024-08-12 09:47:02,421 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12850, loss[loss=0.1001, beats_loss=0.01276, ecapa_loss=0.0001993, whisper_loss=0.08533, over 18067.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01115, ecapa_loss=0.0001816, whisper_loss=0.0915, over 3843247.89 frames. ], batch size: 76, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:47:17,275 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=12.0 2024-08-12 09:47:26,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1577860.0, ans=0.035 2024-08-12 09:47:26,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1577860.0, ans=0.1 2024-08-12 09:47:30,916 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 09:47:45,026 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 09:47:49,480 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.54 vs. limit=15.0 2024-08-12 09:47:50,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1578060.0, ans=0.125 2024-08-12 09:47:55,215 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.74 vs. limit=22.5 2024-08-12 09:48:04,279 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.513e+01 2.797e+01 3.147e+01 4.860e+01, threshold=5.595e+01, percent-clipped=0.0 2024-08-12 09:48:12,717 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12900, loss[loss=0.08049, beats_loss=0.01326, ecapa_loss=0.0001719, whisper_loss=0.06551, over 21212.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01119, ecapa_loss=0.00018, whisper_loss=0.09063, over 3844559.00 frames. ], batch size: 89, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:48:13,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1578260.0, ans=0.95 2024-08-12 09:48:26,703 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 8 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 09:48:33,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1578360.0, ans=0.125 2024-08-12 09:48:36,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1578360.0, ans=0.125 2024-08-12 09:48:39,083 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 09:48:39,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1578460.0, ans=0.125 2024-08-12 09:48:50,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1578460.0, ans=0.5 2024-08-12 09:48:52,760 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.03 vs. limit=15.0 2024-08-12 09:49:17,866 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 09:49:21,797 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 12950, loss[loss=0.1102, beats_loss=0.01013, ecapa_loss=0.0002081, whisper_loss=0.09795, over 21601.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.011, ecapa_loss=0.0001814, whisper_loss=0.0919, over 3841679.11 frames. ], batch size: 87, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:49:29,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1578760.0, ans=0.0 2024-08-12 09:49:39,312 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 09:49:50,947 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-12 09:49:55,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1578960.0, ans=0.0 2024-08-12 09:50:21,947 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-12 09:50:24,599 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.602e+01 2.996e+01 3.291e+01 5.195e+01, threshold=5.992e+01, percent-clipped=0.0 2024-08-12 09:50:33,650 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13000, loss[loss=0.09464, beats_loss=0.01221, ecapa_loss=0.0002113, whisper_loss=0.08032, over 19951.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01107, ecapa_loss=0.00018, whisper_loss=0.09139, over 3854299.06 frames. ], batch size: 86, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:50:34,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1579260.0, ans=10.0 2024-08-12 09:50:49,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1579360.0, ans=0.125 2024-08-12 09:50:49,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1579360.0, ans=0.0 2024-08-12 09:51:16,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1579560.0, ans=0.125 2024-08-12 09:51:18,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1579560.0, ans=0.0 2024-08-12 09:51:19,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1579560.0, ans=0.125 2024-08-12 09:51:42,006 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-12 09:51:42,821 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.36 vs. limit=15.0 2024-08-12 09:51:44,520 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13050, loss[loss=0.124, beats_loss=0.01166, ecapa_loss=0.0001627, whisper_loss=0.1107, over 19237.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01115, ecapa_loss=0.0001786, whisper_loss=0.09103, over 3862873.70 frames. ], batch size: 75, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:51:45,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1579760.0, ans=0.125 2024-08-12 09:51:45,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1579760.0, ans=15.0 2024-08-12 09:51:47,408 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 09:51:49,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1579760.0, ans=0.07 2024-08-12 09:51:53,077 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 09:52:02,037 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 09:52:06,081 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-12 09:52:14,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1579960.0, ans=0.0 2024-08-12 09:52:46,613 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.449e+01 2.683e+01 3.089e+01 1.742e+02, threshold=5.367e+01, percent-clipped=1.0 2024-08-12 09:52:54,717 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13100, loss[loss=0.08525, beats_loss=0.01308, ecapa_loss=0.0001678, whisper_loss=0.07049, over 17408.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01115, ecapa_loss=0.0001774, whisper_loss=0.09093, over 3839654.50 frames. ], batch size: 70, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:53:07,806 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 09:53:10,550 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 09:53:15,411 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.30 vs. limit=15.0 2024-08-12 09:53:30,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1580460.0, ans=0.5 2024-08-12 09:53:38,402 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 09:53:40,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1580560.0, ans=10.0 2024-08-12 09:53:43,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1580560.0, ans=0.0 2024-08-12 09:53:44,476 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 09:53:44,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1580560.0, ans=0.1 2024-08-12 09:53:50,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1580660.0, ans=0.2 2024-08-12 09:54:05,162 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13150, loss[loss=0.1062, beats_loss=0.01108, ecapa_loss=0.0001908, whisper_loss=0.09324, over 22397.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0112, ecapa_loss=0.0001771, whisper_loss=0.09114, over 3869236.57 frames. ], batch size: 91, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:54:16,959 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 09:54:22,438 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 31 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-12 09:54:33,452 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2024-08-12 09:54:37,983 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.55 vs. limit=15.0 2024-08-12 09:54:38,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1580960.0, ans=0.0 2024-08-12 09:54:44,073 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 8 from Vox, 27 fro AS 2024-08-12 09:54:45,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1580960.0, ans=0.125 2024-08-12 09:54:55,540 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.545e+05 2024-08-12 09:55:01,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1581160.0, ans=0.0 2024-08-12 09:55:02,515 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 09:55:03,123 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-12 09:55:07,917 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.633e+01 2.862e+01 3.411e+01 5.758e+01, threshold=5.724e+01, percent-clipped=1.0 2024-08-12 09:55:12,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1581160.0, ans=0.0 2024-08-12 09:55:14,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1581160.0, ans=0.125 2024-08-12 09:55:16,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13200, loss[loss=0.1114, beats_loss=0.01189, ecapa_loss=0.0002053, whisper_loss=0.0975, over 21948.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01124, ecapa_loss=0.0001781, whisper_loss=0.09021, over 3841826.44 frames. ], batch size: 93, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:55:17,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1581260.0, ans=0.0 2024-08-12 09:55:22,438 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 17 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 09:55:37,954 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 09:55:41,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1581360.0, ans=0.2 2024-08-12 09:55:53,497 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-12 09:55:55,599 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.42 vs. limit=10.0 2024-08-12 09:55:57,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1581460.0, ans=0.1 2024-08-12 09:56:01,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1581560.0, ans=0.125 2024-08-12 09:56:07,947 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 09:56:12,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1581660.0, ans=0.0 2024-08-12 09:56:16,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1581660.0, ans=0.0 2024-08-12 09:56:18,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1581660.0, ans=0.125 2024-08-12 09:56:27,908 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13250, loss[loss=0.1139, beats_loss=0.009261, ecapa_loss=0.000244, whisper_loss=0.1022, over 21366.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0112, ecapa_loss=0.0001783, whisper_loss=0.0905, over 3842189.22 frames. ], batch size: 92, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:56:29,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1581760.0, ans=0.0 2024-08-12 09:56:33,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1581760.0, ans=0.0 2024-08-12 09:56:38,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1581760.0, ans=0.0 2024-08-12 09:56:59,143 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.12 vs. limit=12.0 2024-08-12 09:57:17,609 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 09:57:30,444 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.187e+01 2.621e+01 2.894e+01 3.453e+01 5.158e+01, threshold=5.788e+01, percent-clipped=0.0 2024-08-12 09:57:35,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1582160.0, ans=0.125 2024-08-12 09:57:38,969 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13300, loss[loss=0.1014, beats_loss=0.01128, ecapa_loss=0.0001733, whisper_loss=0.08838, over 15618.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01121, ecapa_loss=0.0001775, whisper_loss=0.09102, over 3887443.29 frames. ], batch size: 60, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:57:52,481 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2024-08-12 09:57:54,155 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.74 vs. limit=15.0 2024-08-12 09:58:06,962 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-12 09:58:12,505 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 09:58:25,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1582560.0, ans=0.125 2024-08-12 09:58:49,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1582760.0, ans=0.0 2024-08-12 09:58:49,840 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13350, loss[loss=0.1387, beats_loss=0.007512, ecapa_loss=0.0001884, whisper_loss=0.1293, over 20894.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01116, ecapa_loss=0.0001775, whisper_loss=0.09178, over 3875190.34 frames. ], batch size: 82, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:58:50,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1582760.0, ans=0.125 2024-08-12 09:59:04,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1582860.0, ans=0.125 2024-08-12 09:59:34,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1583060.0, ans=0.0 2024-08-12 09:59:35,382 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 09:59:43,589 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 09:59:43,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1583060.0, ans=0.1 2024-08-12 09:59:43,909 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 09:59:46,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1583160.0, ans=0.2 2024-08-12 09:59:51,683 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.599e+01 2.960e+01 3.368e+01 5.094e+01, threshold=5.919e+01, percent-clipped=0.0 2024-08-12 10:00:00,229 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13400, loss[loss=0.07366, beats_loss=0.01488, ecapa_loss=0.0001722, whisper_loss=0.05705, over 16627.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01126, ecapa_loss=0.0001786, whisper_loss=0.09118, over 3892524.41 frames. ], batch size: 70, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:00:26,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1583360.0, ans=6.0 2024-08-12 10:00:41,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1583560.0, ans=0.125 2024-08-12 10:00:41,943 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.868e+01 2024-08-12 10:00:43,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1583560.0, ans=0.0 2024-08-12 10:00:51,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1583560.0, ans=0.0 2024-08-12 10:01:10,065 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13450, loss[loss=0.09062, beats_loss=0.009705, ecapa_loss=0.0001899, whisper_loss=0.07902, over 15000.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01127, ecapa_loss=0.0001791, whisper_loss=0.09078, over 3880784.28 frames. ], batch size: 58, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:01:42,973 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.48 vs. limit=22.5 2024-08-12 10:01:48,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1583960.0, ans=0.0 2024-08-12 10:01:52,298 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.20 vs. limit=10.0 2024-08-12 10:02:11,051 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.426e+01 2.699e+01 3.096e+01 4.776e+01, threshold=5.398e+01, percent-clipped=0.0 2024-08-12 10:02:11,294 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-12 10:02:19,776 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13500, loss[loss=0.1144, beats_loss=0.01136, ecapa_loss=0.0001399, whisper_loss=0.1016, over 15810.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01114, ecapa_loss=0.0001795, whisper_loss=0.09162, over 3895465.20 frames. ], batch size: 60, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:02:20,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1584260.0, ans=0.125 2024-08-12 10:02:34,509 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 10:02:34,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1584360.0, ans=0.035 2024-08-12 10:02:46,972 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-12 10:02:50,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1584460.0, ans=0.125 2024-08-12 10:03:11,038 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 10:03:30,134 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13550, loss[loss=0.1234, beats_loss=0.007769, ecapa_loss=0.0001985, whisper_loss=0.1136, over 15465.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01109, ecapa_loss=0.0001781, whisper_loss=0.09235, over 3903004.91 frames. ], batch size: 57, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:03:48,531 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 10:04:30,432 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-12 10:04:31,911 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.170e+01 2.628e+01 2.875e+01 3.352e+01 5.913e+01, threshold=5.750e+01, percent-clipped=1.0 2024-08-12 10:04:40,302 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13600, loss[loss=0.1235, beats_loss=0.007711, ecapa_loss=0.0002339, whisper_loss=0.1134, over 17589.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01104, ecapa_loss=0.0001788, whisper_loss=0.09245, over 3846440.23 frames. ], batch size: 74, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:04:44,607 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 21 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 10:05:08,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1585460.0, ans=0.2 2024-08-12 10:05:24,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1585560.0, ans=0.0 2024-08-12 10:05:37,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1585660.0, ans=0.2 2024-08-12 10:05:42,909 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=22.5 2024-08-12 10:05:43,467 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 9 from Vox, 35 fro AS 2024-08-12 10:05:48,734 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13650, loss[loss=0.08845, beats_loss=0.01209, ecapa_loss=0.0002262, whisper_loss=0.07411, over 16683.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0112, ecapa_loss=0.0001796, whisper_loss=0.09139, over 3865703.75 frames. ], batch size: 67, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:05:49,279 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 10:06:02,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1585860.0, ans=0.1 2024-08-12 10:06:05,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1585860.0, ans=0.125 2024-08-12 10:06:19,735 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 31 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-12 10:06:28,340 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-12 10:06:48,505 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.96 vs. limit=15.0 2024-08-12 10:06:50,245 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.547e+01 2.720e+01 3.156e+01 5.627e+01, threshold=5.440e+01, percent-clipped=0.0 2024-08-12 10:06:52,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1586160.0, ans=0.0 2024-08-12 10:06:56,462 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 19 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 10:06:56,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1586160.0, ans=0.125 2024-08-12 10:06:58,156 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 10:06:59,279 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13700, loss[loss=0.09251, beats_loss=0.01017, ecapa_loss=0.0002031, whisper_loss=0.0803, over 15575.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01122, ecapa_loss=0.0001783, whisper_loss=0.09138, over 3872902.19 frames. ], batch size: 64, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:06:59,505 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 10:07:06,342 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 10:07:09,172 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-12 10:07:16,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1586360.0, ans=0.125 2024-08-12 10:07:16,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1586360.0, ans=0.125 2024-08-12 10:07:17,492 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 10:07:17,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1586360.0, ans=0.0 2024-08-12 10:07:23,026 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 9 from Vox, 34 fro AS 2024-08-12 10:07:33,992 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.23 vs. limit=15.0 2024-08-12 10:07:36,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1586460.0, ans=0.2 2024-08-12 10:07:49,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1586560.0, ans=0.1 2024-08-12 10:07:52,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1586560.0, ans=0.1 2024-08-12 10:07:54,547 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-12 10:07:57,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1586660.0, ans=0.0 2024-08-12 10:08:03,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1586660.0, ans=0.125 2024-08-12 10:08:09,400 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13750, loss[loss=0.1111, beats_loss=0.008962, ecapa_loss=0.0001963, whisper_loss=0.1002, over 23034.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01117, ecapa_loss=0.0001772, whisper_loss=0.09168, over 3862018.82 frames. ], batch size: 90, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:08:09,658 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-12 10:08:24,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1586860.0, ans=0.125 2024-08-12 10:09:01,951 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 10:09:06,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1587160.0, ans=0.2 2024-08-12 10:09:11,460 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.427e+01 2.784e+01 3.131e+01 5.573e+01, threshold=5.568e+01, percent-clipped=1.0 2024-08-12 10:09:13,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1587160.0, ans=0.035 2024-08-12 10:09:16,123 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 13 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 10:09:20,065 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13800, loss[loss=0.1029, beats_loss=0.01265, ecapa_loss=0.0001887, whisper_loss=0.08839, over 20071.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01108, ecapa_loss=0.0001777, whisper_loss=0.09195, over 3867681.71 frames. ], batch size: 84, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:09:29,267 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 10:09:38,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1587360.0, ans=0.125 2024-08-12 10:09:50,972 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-12 10:10:20,108 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-12 10:10:23,134 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 10:10:23,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1587660.0, ans=0.125 2024-08-12 10:10:23,616 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2024-08-12 10:10:24,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1587660.0, ans=0.125 2024-08-12 10:10:32,408 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13850, loss[loss=0.09958, beats_loss=0.01082, ecapa_loss=0.0001984, whisper_loss=0.08678, over 18633.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0111, ecapa_loss=0.0001769, whisper_loss=0.09159, over 3846734.57 frames. ], batch size: 76, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:10:42,773 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 10:10:45,922 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 10:11:10,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1587960.0, ans=0.125 2024-08-12 10:11:12,655 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 10:11:19,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1588060.0, ans=0.0 2024-08-12 10:11:31,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1588160.0, ans=0.125 2024-08-12 10:11:35,102 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.570e+01 2.844e+01 3.264e+01 2.322e+02, threshold=5.688e+01, percent-clipped=2.0 2024-08-12 10:11:37,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1588160.0, ans=0.025 2024-08-12 10:11:44,078 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13900, loss[loss=0.08861, beats_loss=0.01291, ecapa_loss=0.0001872, whisper_loss=0.07383, over 19902.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01108, ecapa_loss=0.000178, whisper_loss=0.0924, over 3868250.42 frames. ], batch size: 82, lr: 5.62e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:11:51,471 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.98 vs. limit=15.0 2024-08-12 10:12:02,500 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 17 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-12 10:12:22,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1588460.0, ans=0.125 2024-08-12 10:12:23,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1588460.0, ans=0.1 2024-08-12 10:12:47,132 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 39 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 10:12:57,123 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 10:13:00,945 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 13950, loss[loss=0.09087, beats_loss=0.0123, ecapa_loss=0.0001642, whisper_loss=0.07693, over 22501.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01105, ecapa_loss=0.0001785, whisper_loss=0.09205, over 3878895.28 frames. ], batch size: 90, lr: 5.62e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:13:15,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1588760.0, ans=0.0 2024-08-12 10:13:43,216 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 10:13:53,460 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-12 10:13:56,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1589060.0, ans=0.015 2024-08-12 10:14:03,238 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 10:14:14,128 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.447e+01 2.683e+01 3.149e+01 1.029e+02, threshold=5.366e+01, percent-clipped=1.0 2024-08-12 10:14:23,675 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 14000, loss[loss=0.1086, beats_loss=0.008875, ecapa_loss=0.0001722, whisper_loss=0.09804, over 24143.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01104, ecapa_loss=0.000177, whisper_loss=0.0918, over 3864256.58 frames. ], batch size: 91, lr: 5.62e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:14:34,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1589260.0, ans=0.0 2024-08-12 10:14:41,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1589360.0, ans=0.2 2024-08-12 10:14:46,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1589360.0, ans=0.125 2024-08-12 10:14:55,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1589460.0, ans=0.035 2024-08-12 10:15:02,874 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 10:15:11,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1589560.0, ans=0.2 2024-08-12 10:15:12,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1589560.0, ans=0.0 2024-08-12 10:15:39,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1589660.0, ans=0.125 2024-08-12 10:15:41,954 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 14050, loss[loss=0.09968, beats_loss=0.01216, ecapa_loss=0.0001748, whisper_loss=0.08577, over 14349.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01106, ecapa_loss=0.0001763, whisper_loss=0.09191, over 3850895.04 frames. ], batch size: 56, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:15:42,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1589760.0, ans=0.125 2024-08-12 10:15:50,519 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 10:16:04,410 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 10:16:08,470 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-08-12 10:16:14,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1589960.0, ans=0.1 2024-08-12 10:16:30,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1590060.0, ans=0.1 2024-08-12 10:16:40,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1590060.0, ans=0.1 2024-08-12 10:16:45,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1590060.0, ans=0.125 2024-08-12 10:16:46,439 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 10:16:48,039 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 10:16:54,267 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.562e+01 2.972e+01 3.503e+01 4.652e+01, threshold=5.944e+01, percent-clipped=0.0 2024-08-12 10:17:00,901 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 10:17:03,693 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 14100, loss[loss=0.1188, beats_loss=0.008709, ecapa_loss=0.0001944, whisper_loss=0.1082, over 21800.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01104, ecapa_loss=0.0001769, whisper_loss=0.09176, over 3870766.62 frames. ], batch size: 89, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:17:13,244 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 10:17:23,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1590360.0, ans=0.0 2024-08-12 10:17:38,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1590460.0, ans=10.0 2024-08-12 10:17:40,271 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=15.0 2024-08-12 10:17:53,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1590560.0, ans=0.125 2024-08-12 10:17:55,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1590560.0, ans=0.2 2024-08-12 10:18:00,170 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 10:18:06,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1590560.0, ans=0.125 2024-08-12 10:18:23,573 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 14150, loss[loss=0.1221, beats_loss=0.006807, ecapa_loss=0.0002061, whisper_loss=0.1132, over 17065.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01112, ecapa_loss=0.0001766, whisper_loss=0.0915, over 3871686.70 frames. ], batch size: 65, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:18:23,857 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-12 10:18:29,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1590760.0, ans=0.125 2024-08-12 10:18:43,907 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 10:18:48,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2024-08-12 10:19:02,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1590960.0, ans=0.125 2024-08-12 10:19:08,676 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-12 10:19:12,459 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2024-08-12 10:19:16,967 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 10:19:29,390 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.938e-01 2024-08-12 10:19:39,243 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.532e+01 2.801e+01 3.352e+01 7.282e+01, threshold=5.601e+01, percent-clipped=1.0 2024-08-12 10:19:40,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1591160.0, ans=0.1 2024-08-12 10:19:43,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1591160.0, ans=0.0 2024-08-12 10:19:43,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1591160.0, ans=0.125 2024-08-12 10:19:44,549 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 10:19:49,232 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 14200, loss[loss=0.1105, beats_loss=0.01219, ecapa_loss=0.0001993, whisper_loss=0.09633, over 22229.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0111, ecapa_loss=0.0001761, whisper_loss=0.0919, over 3878682.72 frames. ], batch size: 92, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:20:04,058 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2024-08-12 10:20:20,125 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 10:20:44,956 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 10:21:01,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1591660.0, ans=0.2 2024-08-12 10:21:08,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1591660.0, ans=0.125 2024-08-12 10:21:10,550 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 38 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 10:21:11,670 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 14250, loss[loss=0.1267, beats_loss=0.01027, ecapa_loss=0.0001647, whisper_loss=0.1147, over 23692.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01111, ecapa_loss=0.0001754, whisper_loss=0.09195, over 3872142.63 frames. ], batch size: 93, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:21:20,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1591760.0, ans=0.125 2024-08-12 10:21:21,243 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 27 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 10:21:39,882 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 20 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 10:22:00,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1592060.0, ans=0.0 2024-08-12 10:22:22,533 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.447e+01 2.773e+01 3.183e+01 5.230e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-12 10:22:33,103 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 14300, loss[loss=0.09611, beats_loss=0.01152, ecapa_loss=0.0001553, whisper_loss=0.08304, over 23087.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01117, ecapa_loss=0.0001754, whisper_loss=0.09086, over 3890586.01 frames. ], batch size: 89, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:22:33,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1592260.0, ans=0.125 2024-08-12 10:22:49,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1592360.0, ans=0.125 2024-08-12 10:22:56,163 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-12 10:22:59,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1592360.0, ans=0.09899494936611666 2024-08-12 10:23:02,660 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 10:23:13,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1592460.0, ans=0.125 2024-08-12 10:23:18,052 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 10:23:21,458 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 8 from Vox, 35 fro AS 2024-08-12 10:23:44,343 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-12 10:23:55,933 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 14350, loss[loss=0.1183, beats_loss=0.009219, ecapa_loss=0.0001552, whisper_loss=0.1075, over 18358.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01111, ecapa_loss=0.0001762, whisper_loss=0.0907, over 3849072.16 frames. ], batch size: 68, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:24:12,410 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2024-08-12 10:24:33,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1592960.0, ans=0.1 2024-08-12 10:24:57,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1593160.0, ans=0.0 2024-08-12 10:25:00,230 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.480e+01 2.799e+01 3.080e+01 4.714e+01, threshold=5.598e+01, percent-clipped=0.0 2024-08-12 10:25:08,469 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 14400, loss[loss=0.1102, beats_loss=0.01141, ecapa_loss=0.0001643, whisper_loss=0.09711, over 22544.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01122, ecapa_loss=0.0001778, whisper_loss=0.09045, over 3887261.91 frames. ], batch size: 89, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:25:20,572 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 10:25:20,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1593260.0, ans=10.0 2024-08-12 10:25:32,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1593360.0, ans=0.125 2024-08-12 10:25:48,346 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 10:25:50,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1593460.0, ans=0.2 2024-08-12 10:26:01,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1593560.0, ans=0.2 2024-08-12 10:26:04,458 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-12 10:26:12,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=1593660.0, ans=0.02 2024-08-12 10:26:21,823 INFO [train_multi_KD3.py:1116] (2/4) Epoch 11, batch 14450, loss[loss=0.1075, beats_loss=0.009357, ecapa_loss=0.0001774, whisper_loss=0.0964, over 21847.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01114, ecapa_loss=0.000179, whisper_loss=0.09105, over 3879514.58 frames. ], batch size: 85, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:26:31,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1593760.0, ans=0.125 2024-08-12 10:26:51,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1593960.0, ans=0.125 2024-08-12 10:27:02,351 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 17 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-12 10:27:09,451 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 10:27:09,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1594060.0, ans=0.125 2024-08-12 10:27:48,671 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 0, loss[loss=0.1045, beats_loss=0.01201, ecapa_loss=0.0001922, whisper_loss=0.09058, over 21981.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01201, ecapa_loss=0.0001922, whisper_loss=0.09058, over 21981.00 frames. ], batch size: 93, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:27:48,672 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 10:28:01,097 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.9744, 1.9228, 2.0556, 2.5513], device='cuda:2') 2024-08-12 10:28:26,821 INFO [train_multi_KD3.py:1149] (2/4) Epoch 12, validation on ASR_libri: loss=0.2553, beats_loss=0, ecapa_loss=0.0005949, whisper_loss=0.2493, over 922467.00 frames. 2024-08-12 10:28:43,352 INFO [train_multi_KD3.py:1149] (2/4) Epoch 12, validation on SV_voxceleb1: loss=0.004912, beats_loss=0, ecapa_loss=0.0004912, whisper_loss=0, over 939242.00 frames. 2024-08-12 10:29:19,315 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.2791, 1.7425, 1.7883, 2.4842], device='cuda:2') 2024-08-12 10:30:40,493 INFO [train_multi_KD3.py:1149] (2/4) Epoch 12, validation on AT_audioset: loss=0.02433, beats_loss=0.02433, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 10:30:40,497 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 10:30:40,662 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-12 10:30:40,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1594110.0, ans=0.2 2024-08-12 10:30:46,832 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.681e-01 2024-08-12 10:30:56,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1594110.0, ans=10.0 2024-08-12 10:30:59,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1594110.0, ans=0.125 2024-08-12 10:30:59,941 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.491e+01 2.893e+01 3.197e+01 9.364e+01, threshold=5.786e+01, percent-clipped=1.0 2024-08-12 10:31:12,455 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 10:31:27,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1594310.0, ans=0.125 2024-08-12 10:31:36,513 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2024-08-12 10:31:49,859 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 10:32:00,015 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-12 10:32:04,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1594510.0, ans=0.0 2024-08-12 10:32:09,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1594510.0, ans=0.125 2024-08-12 10:32:16,573 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-12 10:32:24,834 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 50, loss[loss=0.1216, beats_loss=0.0104, ecapa_loss=0.0001984, whisper_loss=0.1092, over 23409.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01064, ecapa_loss=0.0001793, whisper_loss=0.09011, over 893049.66 frames. ], batch size: 92, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:32:37,376 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-12 10:33:22,332 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-12 10:33:25,321 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=12.0 2024-08-12 10:33:31,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1594910.0, ans=0.1 2024-08-12 10:33:34,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1594910.0, ans=0.07 2024-08-12 10:33:40,189 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 10:33:43,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1594910.0, ans=0.1 2024-08-12 10:33:56,216 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 10:34:14,934 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 100, loss[loss=0.1147, beats_loss=0.008216, ecapa_loss=0.0002002, whisper_loss=0.1045, over 21504.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01028, ecapa_loss=0.0001799, whisper_loss=0.09137, over 1553449.35 frames. ], batch size: 84, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:34:29,024 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 23 from LS+wenet, 8 from Vox, 27 fro AS 2024-08-12 10:34:33,919 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.43 vs. limit=22.5 2024-08-12 10:34:34,592 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.164e+01 2.774e+01 3.018e+01 3.442e+01 6.372e+01, threshold=6.036e+01, percent-clipped=2.0 2024-08-12 10:34:46,237 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 10:34:46,874 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.93 vs. limit=22.5 2024-08-12 10:35:09,452 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 10:35:18,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1595310.0, ans=0.1 2024-08-12 10:35:29,541 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 10:35:33,481 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-12 10:35:46,509 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.54 vs. limit=15.0 2024-08-12 10:36:03,690 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 150, loss[loss=0.1016, beats_loss=0.008379, ecapa_loss=0.000197, whisper_loss=0.09121, over 14313.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01024, ecapa_loss=0.0001809, whisper_loss=0.0928, over 2054613.11 frames. ], batch size: 57, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:36:23,244 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.39 vs. limit=22.5 2024-08-12 10:36:23,366 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.81 vs. limit=10.0 2024-08-12 10:36:44,345 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.63 vs. limit=10.0 2024-08-12 10:36:49,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1595810.0, ans=0.0 2024-08-12 10:36:50,869 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 10:36:56,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1595910.0, ans=0.1 2024-08-12 10:36:58,944 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 10:37:11,977 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.56 vs. limit=22.5 2024-08-12 10:37:17,108 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 10:37:31,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1596110.0, ans=0.1 2024-08-12 10:37:31,900 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 200, loss[loss=0.09979, beats_loss=0.009933, ecapa_loss=0.0002019, whisper_loss=0.08783, over 14666.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01038, ecapa_loss=0.0001812, whisper_loss=0.09275, over 2454905.27 frames. ], batch size: 60, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:37:49,551 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.723e+01 2.993e+01 3.587e+01 5.466e+01, threshold=5.985e+01, percent-clipped=0.0 2024-08-12 10:38:31,678 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 10:38:43,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1596510.0, ans=0.125 2024-08-12 10:38:51,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1596510.0, ans=0.125 2024-08-12 10:38:59,237 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 250, loss[loss=0.08599, beats_loss=0.01001, ecapa_loss=0.0001751, whisper_loss=0.07423, over 17334.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01045, ecapa_loss=0.0001804, whisper_loss=0.09235, over 2758361.52 frames. ], batch size: 68, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:39:16,473 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 10 from Vox, 44 fro AS 2024-08-12 10:39:16,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1596710.0, ans=0.0 2024-08-12 10:39:26,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1596710.0, ans=0.0 2024-08-12 10:39:38,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1596810.0, ans=0.125 2024-08-12 10:39:53,753 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-12 10:39:55,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1596910.0, ans=0.0 2024-08-12 10:39:55,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1596910.0, ans=0.2 2024-08-12 10:40:01,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1596910.0, ans=0.125 2024-08-12 10:40:01,516 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.62 vs. limit=15.0 2024-08-12 10:40:04,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1597010.0, ans=0.1 2024-08-12 10:40:19,701 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 300, loss[loss=0.0792, beats_loss=0.01094, ecapa_loss=0.0001963, whisper_loss=0.06629, over 13699.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01056, ecapa_loss=0.0001806, whisper_loss=0.09171, over 2948293.22 frames. ], batch size: 58, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:40:22,485 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.70 vs. limit=22.5 2024-08-12 10:40:34,400 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.533e+01 2.859e+01 3.181e+01 4.204e+01, threshold=5.718e+01, percent-clipped=0.0 2024-08-12 10:40:34,626 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 10:40:39,857 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-12 10:40:57,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1597310.0, ans=0.2 2024-08-12 10:41:06,308 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 10:41:16,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1597410.0, ans=0.0 2024-08-12 10:41:39,849 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 350, loss[loss=0.1037, beats_loss=0.007883, ecapa_loss=0.000218, whisper_loss=0.0936, over 16059.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01063, ecapa_loss=0.0001793, whisper_loss=0.09108, over 3136423.62 frames. ], batch size: 63, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:41:40,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1597610.0, ans=6.0 2024-08-12 10:42:31,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1597910.0, ans=0.125 2024-08-12 10:42:37,412 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 10:42:57,169 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 400, loss[loss=0.1047, beats_loss=0.01308, ecapa_loss=0.0001566, whisper_loss=0.09, over 22636.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01073, ecapa_loss=0.0001778, whisper_loss=0.0903, over 3257456.08 frames. ], batch size: 90, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:43:11,802 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.512e+01 2.716e+01 3.145e+01 4.909e+01, threshold=5.433e+01, percent-clipped=0.0 2024-08-12 10:43:12,108 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 10:43:19,638 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 13 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 10:43:28,250 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=23.08 vs. limit=22.5 2024-08-12 10:43:44,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1598410.0, ans=0.0 2024-08-12 10:44:15,441 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 450, loss[loss=0.1045, beats_loss=0.01306, ecapa_loss=0.0001463, whisper_loss=0.08996, over 22410.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01078, ecapa_loss=0.0001763, whisper_loss=0.09029, over 3401136.35 frames. ], batch size: 88, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:44:20,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1598610.0, ans=0.0 2024-08-12 10:44:32,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1598710.0, ans=0.0 2024-08-12 10:45:05,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1598910.0, ans=0.0 2024-08-12 10:45:12,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1598910.0, ans=0.0 2024-08-12 10:45:32,809 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 500, loss[loss=0.1177, beats_loss=0.00845, ecapa_loss=0.0001818, whisper_loss=0.1075, over 21627.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0107, ecapa_loss=0.0001756, whisper_loss=0.09127, over 3501962.14 frames. ], batch size: 83, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:45:33,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1599110.0, ans=0.1 2024-08-12 10:45:33,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1599110.0, ans=0.0 2024-08-12 10:45:46,833 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.446e+01 2.825e+01 3.305e+01 5.621e+01, threshold=5.651e+01, percent-clipped=2.0 2024-08-12 10:45:56,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1599210.0, ans=0.125 2024-08-12 10:46:14,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1599310.0, ans=0.0 2024-08-12 10:46:15,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1599310.0, ans=0.125 2024-08-12 10:46:33,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.46 vs. limit=15.0 2024-08-12 10:46:48,007 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 10:46:52,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1599610.0, ans=10.0 2024-08-12 10:46:52,769 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 550, loss[loss=0.1259, beats_loss=0.01016, ecapa_loss=0.0001746, whisper_loss=0.114, over 18476.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01068, ecapa_loss=0.0001755, whisper_loss=0.09115, over 3551002.48 frames. ], batch size: 73, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:47:19,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1599710.0, ans=0.125 2024-08-12 10:47:21,325 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2024-08-12 10:47:32,684 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-12 10:47:46,331 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 10:48:00,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1600010.0, ans=0.125 2024-08-12 10:48:14,680 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 600, loss[loss=0.1023, beats_loss=0.009909, ecapa_loss=0.0002185, whisper_loss=0.09023, over 16555.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01067, ecapa_loss=0.0001758, whisper_loss=0.09129, over 3608230.79 frames. ], batch size: 69, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:48:15,665 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.88 vs. limit=22.5 2024-08-12 10:48:27,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1600110.0, ans=0.125 2024-08-12 10:48:28,526 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.536e+01 2.795e+01 3.405e+01 6.348e+01, threshold=5.590e+01, percent-clipped=1.0 2024-08-12 10:48:28,798 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 10:48:35,248 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.39 vs. limit=22.5 2024-08-12 10:48:42,466 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 10:48:43,912 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 10:49:03,126 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-12 10:49:11,493 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 19 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-12 10:49:31,242 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 650, loss[loss=0.07135, beats_loss=0.01105, ecapa_loss=0.00015, whisper_loss=0.05879, over 14943.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.0001751, whisper_loss=0.09084, over 3659242.80 frames. ], batch size: 57, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:49:39,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1600610.0, ans=0.0 2024-08-12 10:49:41,674 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 10:49:44,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1600610.0, ans=0.0 2024-08-12 10:49:54,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1600710.0, ans=0.0 2024-08-12 10:49:55,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1600710.0, ans=0.2 2024-08-12 10:50:33,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1600910.0, ans=0.125 2024-08-12 10:50:46,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1601010.0, ans=0.125 2024-08-12 10:50:52,213 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 700, loss[loss=0.08439, beats_loss=0.01196, ecapa_loss=0.0001369, whisper_loss=0.07107, over 18417.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01077, ecapa_loss=0.0001753, whisper_loss=0.0907, over 3670919.38 frames. ], batch size: 69, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:50:59,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1601110.0, ans=0.1 2024-08-12 10:51:06,157 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.429e+01 2.647e+01 2.906e+01 4.054e+01, threshold=5.293e+01, percent-clipped=0.0 2024-08-12 10:51:08,424 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-12 10:51:11,375 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 10:51:36,154 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 31 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-12 10:51:38,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1601410.0, ans=0.0 2024-08-12 10:52:09,573 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 30 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 10:52:10,699 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 750, loss[loss=0.1249, beats_loss=0.01045, ecapa_loss=0.0001704, whisper_loss=0.1127, over 19914.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01081, ecapa_loss=0.000174, whisper_loss=0.09057, over 3712783.27 frames. ], batch size: 78, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:52:12,473 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 13 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 10:52:13,945 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 10:52:19,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1601610.0, ans=0.125 2024-08-12 10:52:22,971 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=15.0 2024-08-12 10:52:27,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1601710.0, ans=0.125 2024-08-12 10:52:58,813 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-12 10:53:11,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1601910.0, ans=0.125 2024-08-12 10:53:12,451 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 10:53:29,897 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 800, loss[loss=0.12, beats_loss=0.009372, ecapa_loss=0.00021, whisper_loss=0.1085, over 22745.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01082, ecapa_loss=0.0001751, whisper_loss=0.09063, over 3736031.78 frames. ], batch size: 91, lr: 5.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:53:45,630 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.463e+01 2.797e+01 3.235e+01 6.542e+01, threshold=5.594e+01, percent-clipped=1.0 2024-08-12 10:53:48,960 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-12 10:53:55,723 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.203e+05 2024-08-12 10:53:55,939 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2024-08-12 10:54:00,718 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2024-08-12 10:54:03,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1602310.0, ans=0.125 2024-08-12 10:54:12,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1602310.0, ans=0.125 2024-08-12 10:54:50,064 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 850, loss[loss=0.07016, beats_loss=0.01284, ecapa_loss=0.0001874, whisper_loss=0.05545, over 15170.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01081, ecapa_loss=0.0001749, whisper_loss=0.09002, over 3729203.77 frames. ], batch size: 64, lr: 5.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:55:17,482 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 10:55:21,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1602810.0, ans=0.0 2024-08-12 10:55:44,639 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2024-08-12 10:55:51,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1602910.0, ans=0.0 2024-08-12 10:55:52,918 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2024-08-12 10:55:55,244 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 10:55:57,110 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-08-12 10:56:06,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1603010.0, ans=0.125 2024-08-12 10:56:07,106 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=15.0 2024-08-12 10:56:09,049 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 900, loss[loss=0.1132, beats_loss=0.01036, ecapa_loss=0.0001693, whisper_loss=0.1011, over 22674.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01084, ecapa_loss=0.0001721, whisper_loss=0.09058, over 3760903.12 frames. ], batch size: 93, lr: 5.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:56:15,845 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 10:56:24,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1603110.0, ans=0.05 2024-08-12 10:56:29,431 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.446e+01 2.685e+01 3.025e+01 4.659e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-12 10:56:30,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1603210.0, ans=0.1 2024-08-12 10:56:36,157 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 23 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-12 10:56:38,251 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-12 10:56:40,748 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 10:56:48,294 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 10:56:55,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1603310.0, ans=0.125 2024-08-12 10:57:05,312 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 10:57:11,926 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-12 10:57:33,395 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 950, loss[loss=0.1258, beats_loss=0.01069, ecapa_loss=0.000144, whisper_loss=0.1137, over 21113.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01076, ecapa_loss=0.0001724, whisper_loss=0.09174, over 3786917.54 frames. ], batch size: 78, lr: 5.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:57:40,108 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 10:57:48,245 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.25 vs. limit=15.0 2024-08-12 10:58:05,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1603710.0, ans=0.125 2024-08-12 10:58:46,286 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.69 vs. limit=15.0 2024-08-12 10:58:49,507 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 10:59:03,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1604010.0, ans=0.0 2024-08-12 10:59:10,591 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1000, loss[loss=0.1011, beats_loss=0.01309, ecapa_loss=0.0001895, whisper_loss=0.08608, over 19451.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01083, ecapa_loss=0.000172, whisper_loss=0.09145, over 3798636.97 frames. ], batch size: 80, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:59:12,906 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 19 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 10:59:19,283 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 10:59:20,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1604110.0, ans=0.125 2024-08-12 10:59:28,329 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 10:59:30,696 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 10:59:32,889 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.574e+01 2.849e+01 3.275e+01 5.377e+01, threshold=5.697e+01, percent-clipped=1.0 2024-08-12 10:59:37,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1604210.0, ans=0.125 2024-08-12 10:59:52,570 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 11:00:03,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1604310.0, ans=0.2 2024-08-12 11:00:12,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1604410.0, ans=0.125 2024-08-12 11:00:25,867 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 11:00:28,199 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 11:00:39,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1604510.0, ans=0.125 2024-08-12 11:00:58,814 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1050, loss[loss=0.08647, beats_loss=0.01385, ecapa_loss=0.0001177, whisper_loss=0.07144, over 18323.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01089, ecapa_loss=0.0001724, whisper_loss=0.09029, over 3777002.73 frames. ], batch size: 70, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:00:59,704 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.88 vs. limit=15.0 2024-08-12 11:01:07,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1604610.0, ans=0.04949747468305833 2024-08-12 11:01:49,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1604810.0, ans=0.125 2024-08-12 11:01:52,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1604810.0, ans=0.1 2024-08-12 11:01:55,937 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-12 11:02:50,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1605010.0, ans=0.0 2024-08-12 11:03:01,266 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1100, loss[loss=0.08869, beats_loss=0.01389, ecapa_loss=0.0001507, whisper_loss=0.07329, over 19177.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01093, ecapa_loss=0.0001713, whisper_loss=0.08995, over 3786787.99 frames. ], batch size: 78, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:03:07,175 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 11:03:10,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1605110.0, ans=0.125 2024-08-12 11:03:13,438 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 11:03:27,188 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.546e+01 2.827e+01 3.274e+01 5.638e+01, threshold=5.654e+01, percent-clipped=0.0 2024-08-12 11:03:33,853 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-12 11:03:38,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1605210.0, ans=0.125 2024-08-12 11:05:02,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1605510.0, ans=0.1 2024-08-12 11:05:09,783 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1150, loss[loss=0.09499, beats_loss=0.008972, ecapa_loss=0.0001735, whisper_loss=0.08428, over 16352.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01088, ecapa_loss=0.0001719, whisper_loss=0.08985, over 3784157.88 frames. ], batch size: 63, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:05:41,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1605710.0, ans=0.0 2024-08-12 11:05:48,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1605710.0, ans=0.1 2024-08-12 11:06:18,306 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.19 vs. limit=6.0 2024-08-12 11:06:46,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1605910.0, ans=0.125 2024-08-12 11:06:53,447 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.97 vs. limit=10.0 2024-08-12 11:07:05,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1606010.0, ans=0.0 2024-08-12 11:07:14,640 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1200, loss[loss=0.1141, beats_loss=0.0112, ecapa_loss=0.0001453, whisper_loss=0.1014, over 20511.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01094, ecapa_loss=0.0001728, whisper_loss=0.08987, over 3816922.21 frames. ], batch size: 79, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:07:26,069 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 35 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-12 11:07:36,820 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.362e+01 2.610e+01 2.988e+01 4.824e+01, threshold=5.220e+01, percent-clipped=0.0 2024-08-12 11:07:49,794 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 11:07:59,407 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-12 11:08:51,898 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 35 from Vox, 34 fro AS 2024-08-12 11:09:00,412 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 13 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 11:09:02,632 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2024-08-12 11:09:03,126 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1250, loss[loss=0.1175, beats_loss=0.01035, ecapa_loss=0.0001652, whisper_loss=0.1055, over 22042.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0109, ecapa_loss=0.0001718, whisper_loss=0.09069, over 3823341.87 frames. ], batch size: 88, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:10:12,099 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2024-08-12 11:10:19,780 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-12 11:10:27,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1607110.0, ans=0.125 2024-08-12 11:10:28,169 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1300, loss[loss=0.08601, beats_loss=0.009649, ecapa_loss=0.0001698, whisper_loss=0.07466, over 18170.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01101, ecapa_loss=0.0001704, whisper_loss=0.08967, over 3837844.51 frames. ], batch size: 71, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:10:44,495 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.467e+01 2.705e+01 3.116e+01 5.074e+01, threshold=5.411e+01, percent-clipped=0.0 2024-08-12 11:10:48,386 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 11:10:53,161 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 11:10:55,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1607210.0, ans=0.1 2024-08-12 11:11:18,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1607410.0, ans=0.125 2024-08-12 11:11:49,354 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1350, loss[loss=0.09473, beats_loss=0.01006, ecapa_loss=0.0002119, whisper_loss=0.08255, over 15055.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01102, ecapa_loss=0.0001705, whisper_loss=0.0895, over 3825817.22 frames. ], batch size: 60, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:11:58,864 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 11:12:19,500 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.92 vs. limit=12.0 2024-08-12 11:12:19,820 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.21 vs. limit=15.0 2024-08-12 11:12:33,899 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 11:12:52,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1607910.0, ans=0.0 2024-08-12 11:12:54,091 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 11:12:56,877 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 11:13:08,107 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.15 vs. limit=22.5 2024-08-12 11:13:11,489 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1400, loss[loss=0.1144, beats_loss=0.01219, ecapa_loss=0.0001646, whisper_loss=0.1006, over 22444.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01102, ecapa_loss=0.0001701, whisper_loss=0.08969, over 3820033.76 frames. ], batch size: 88, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:13:18,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1608110.0, ans=0.125 2024-08-12 11:13:27,792 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.408e+01 2.816e+01 3.296e+01 5.087e+01, threshold=5.632e+01, percent-clipped=0.0 2024-08-12 11:13:28,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1608210.0, ans=0.125 2024-08-12 11:13:58,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1608310.0, ans=0.125 2024-08-12 11:14:07,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1608410.0, ans=0.125 2024-08-12 11:14:09,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1608410.0, ans=0.125 2024-08-12 11:14:10,590 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1608410.0, ans=0.2 2024-08-12 11:14:26,845 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 19 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-12 11:14:34,844 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1450, loss[loss=0.08098, beats_loss=0.01373, ecapa_loss=0.0001865, whisper_loss=0.06538, over 17818.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01098, ecapa_loss=0.0001701, whisper_loss=0.0891, over 3812435.90 frames. ], batch size: 73, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:15:12,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1608710.0, ans=0.0 2024-08-12 11:15:40,437 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-12 11:15:49,072 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 11:16:12,834 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 11:16:21,068 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 11:16:22,121 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1500, loss[loss=0.1092, beats_loss=0.01021, ecapa_loss=0.0001777, whisper_loss=0.09717, over 18728.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01098, ecapa_loss=0.000168, whisper_loss=0.08967, over 3799414.97 frames. ], batch size: 76, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:16:24,268 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 11:16:24,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1609110.0, ans=0.0 2024-08-12 11:16:38,281 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.429e+01 2.735e+01 3.054e+01 5.898e+01, threshold=5.470e+01, percent-clipped=1.0 2024-08-12 11:16:40,565 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 16 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 11:17:05,031 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 11:17:19,481 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=15.32 vs. limit=12.0 2024-08-12 11:17:44,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1609510.0, ans=0.1 2024-08-12 11:17:52,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1550, loss[loss=0.09189, beats_loss=0.01309, ecapa_loss=0.0001315, whisper_loss=0.07749, over 14535.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01092, ecapa_loss=0.0001684, whisper_loss=0.08968, over 3796925.74 frames. ], batch size: 59, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:17:54,960 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 11:17:59,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1609610.0, ans=0.2 2024-08-12 11:17:59,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1609610.0, ans=0.125 2024-08-12 11:17:59,327 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.77 vs. limit=22.5 2024-08-12 11:18:07,663 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-12 11:18:13,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1609710.0, ans=0.0 2024-08-12 11:18:25,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1609710.0, ans=0.125 2024-08-12 11:18:34,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1609810.0, ans=0.125 2024-08-12 11:18:55,204 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.51 vs. limit=5.0 2024-08-12 11:18:56,269 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-12 11:19:04,324 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-12 11:19:05,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1610010.0, ans=0.0 2024-08-12 11:19:19,293 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1600, loss[loss=0.1007, beats_loss=0.00943, ecapa_loss=0.0001463, whisper_loss=0.08978, over 17729.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01085, ecapa_loss=0.0001688, whisper_loss=0.09069, over 3815924.21 frames. ], batch size: 67, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:19:35,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1610210.0, ans=0.125 2024-08-12 11:19:36,512 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.594e+01 2.878e+01 3.251e+01 6.117e+01, threshold=5.756e+01, percent-clipped=2.0 2024-08-12 11:19:40,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1610210.0, ans=0.125 2024-08-12 11:19:51,798 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-12 11:20:04,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1610310.0, ans=0.2 2024-08-12 11:20:24,137 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 11:20:28,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1610510.0, ans=0.125 2024-08-12 11:20:30,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1610510.0, ans=15.0 2024-08-12 11:20:43,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1610510.0, ans=0.125 2024-08-12 11:20:45,802 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1650, loss[loss=0.08761, beats_loss=0.0125, ecapa_loss=0.000187, whisper_loss=0.07325, over 15066.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01098, ecapa_loss=0.000169, whisper_loss=0.0907, over 3846840.55 frames. ], batch size: 61, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:20:57,217 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2024-08-12 11:20:58,039 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 11:20:58,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1610610.0, ans=0.125 2024-08-12 11:20:58,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1610610.0, ans=0.125 2024-08-12 11:21:05,765 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2024-08-12 11:21:14,932 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 31 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 11:21:30,263 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2024-08-12 11:21:41,680 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=17.63 vs. limit=15.0 2024-08-12 11:21:46,872 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 11:21:54,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1611010.0, ans=0.0 2024-08-12 11:21:54,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1611010.0, ans=0.125 2024-08-12 11:22:02,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1611010.0, ans=0.125 2024-08-12 11:22:08,522 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1700, loss[loss=0.09865, beats_loss=0.008763, ecapa_loss=0.0002063, whisper_loss=0.08782, over 14064.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01091, ecapa_loss=0.0001698, whisper_loss=0.09109, over 3829145.89 frames. ], batch size: 56, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:22:11,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1611110.0, ans=0.0 2024-08-12 11:22:24,324 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.098e+01 2.487e+01 2.798e+01 3.265e+01 1.299e+02, threshold=5.596e+01, percent-clipped=2.0 2024-08-12 11:22:31,214 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 11:22:34,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1611210.0, ans=0.125 2024-08-12 11:22:42,798 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 11:22:45,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1611310.0, ans=0.125 2024-08-12 11:22:45,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1611310.0, ans=0.0 2024-08-12 11:22:54,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1611310.0, ans=10.0 2024-08-12 11:22:59,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1611410.0, ans=0.1 2024-08-12 11:22:59,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1611410.0, ans=0.125 2024-08-12 11:23:23,801 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 30 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 11:23:29,978 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1750, loss[loss=0.1032, beats_loss=0.01022, ecapa_loss=0.0001412, whisper_loss=0.09152, over 16861.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01094, ecapa_loss=0.0001692, whisper_loss=0.09114, over 3864232.40 frames. ], batch size: 61, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:23:42,699 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 11:23:42,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1611610.0, ans=0.04949747468305833 2024-08-12 11:24:07,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1611810.0, ans=0.125 2024-08-12 11:24:10,624 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-12 11:24:14,240 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.63 vs. limit=15.0 2024-08-12 11:24:18,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1611910.0, ans=0.035 2024-08-12 11:24:24,123 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 11:24:28,725 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 26 from Vox, 19 fro AS 2024-08-12 11:24:30,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1611910.0, ans=0.125 2024-08-12 11:24:49,890 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1800, loss[loss=0.1057, beats_loss=0.008489, ecapa_loss=0.0001576, whisper_loss=0.09566, over 15549.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01084, ecapa_loss=0.0001702, whisper_loss=0.09127, over 3862217.51 frames. ], batch size: 57, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:24:55,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1612110.0, ans=0.125 2024-08-12 11:25:05,866 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.474e+01 2.742e+01 2.995e+01 4.904e+01, threshold=5.483e+01, percent-clipped=0.0 2024-08-12 11:25:11,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1612210.0, ans=0.0 2024-08-12 11:25:22,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1612310.0, ans=0.125 2024-08-12 11:25:25,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1612310.0, ans=0.125 2024-08-12 11:25:31,614 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 11:25:36,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1612410.0, ans=10.0 2024-08-12 11:25:50,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1612410.0, ans=0.1 2024-08-12 11:25:58,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1612510.0, ans=0.125 2024-08-12 11:26:08,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1612510.0, ans=0.0 2024-08-12 11:26:14,031 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.070e-01 2024-08-12 11:26:14,767 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1850, loss[loss=0.09525, beats_loss=0.009324, ecapa_loss=0.0002098, whisper_loss=0.08383, over 16264.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01091, ecapa_loss=0.0001696, whisper_loss=0.09055, over 3866116.47 frames. ], batch size: 65, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:26:18,373 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 11:26:18,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1612610.0, ans=0.0 2024-08-12 11:27:06,057 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 11:27:17,993 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 11:27:22,019 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 11:27:33,362 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 11:27:51,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1613010.0, ans=0.2 2024-08-12 11:28:04,054 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1900, loss[loss=0.09008, beats_loss=0.01311, ecapa_loss=0.0001929, whisper_loss=0.07504, over 19053.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01098, ecapa_loss=0.000171, whisper_loss=0.09015, over 3859281.58 frames. ], batch size: 80, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:28:13,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1613110.0, ans=0.125 2024-08-12 11:28:22,296 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 17 from Vox, 51 fro AS 2024-08-12 11:28:26,596 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.546e+01 2.864e+01 3.475e+01 5.350e+01, threshold=5.728e+01, percent-clipped=0.0 2024-08-12 11:28:38,568 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 11:29:13,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1613410.0, ans=0.0 2024-08-12 11:29:36,074 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2024-08-12 11:29:40,303 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 17 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-12 11:29:44,748 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 1950, loss[loss=0.1257, beats_loss=0.01001, ecapa_loss=0.0002057, whisper_loss=0.1136, over 22320.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01097, ecapa_loss=0.0001727, whisper_loss=0.09058, over 3872107.47 frames. ], batch size: 89, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:29:59,651 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-12 11:30:03,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1613710.0, ans=0.125 2024-08-12 11:30:11,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1613710.0, ans=0.125 2024-08-12 11:30:14,113 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 11:30:22,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1613810.0, ans=0.125 2024-08-12 11:30:23,759 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 11:30:27,419 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.548e-01 2024-08-12 11:30:28,858 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 13 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 11:30:30,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1613810.0, ans=0.125 2024-08-12 11:30:33,358 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 15 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 11:30:41,727 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 11:30:43,828 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.71 vs. limit=6.0 2024-08-12 11:30:52,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1614010.0, ans=0.0 2024-08-12 11:30:54,745 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 38 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 11:31:05,515 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2000, loss[loss=0.09229, beats_loss=0.01036, ecapa_loss=0.000185, whisper_loss=0.08008, over 20681.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.011, ecapa_loss=0.0001742, whisper_loss=0.09004, over 3882370.86 frames. ], batch size: 80, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:31:10,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2024-08-12 11:31:20,979 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.474e+01 2.700e+01 3.035e+01 6.607e+01, threshold=5.401e+01, percent-clipped=2.0 2024-08-12 11:31:21,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1614210.0, ans=0.125 2024-08-12 11:31:26,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1614210.0, ans=0.125 2024-08-12 11:31:28,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1614210.0, ans=0.1 2024-08-12 11:31:39,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1614310.0, ans=0.1 2024-08-12 11:32:02,392 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-12 11:32:16,024 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.064e+02 2024-08-12 11:32:24,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1614610.0, ans=0.125 2024-08-12 11:32:24,766 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2050, loss[loss=0.1082, beats_loss=0.009754, ecapa_loss=0.0001732, whisper_loss=0.09667, over 22731.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01099, ecapa_loss=0.0001736, whisper_loss=0.08985, over 3883792.46 frames. ], batch size: 89, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:32:48,165 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.70 vs. limit=12.0 2024-08-12 11:33:24,593 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-12 11:33:26,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1614910.0, ans=0.1 2024-08-12 11:33:30,303 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 11:33:46,548 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2100, loss[loss=0.1051, beats_loss=0.01239, ecapa_loss=0.0001819, whisper_loss=0.09085, over 18989.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01099, ecapa_loss=0.000173, whisper_loss=0.08942, over 3842809.90 frames. ], batch size: 78, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:33:48,268 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 11:33:48,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1615110.0, ans=0.125 2024-08-12 11:34:02,326 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.515e+01 2.855e+01 3.226e+01 9.750e+01, threshold=5.709e+01, percent-clipped=2.0 2024-08-12 11:34:34,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1615410.0, ans=0.1 2024-08-12 11:34:35,313 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 26 from Vox, 20 fro AS 2024-08-12 11:34:36,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=1615410.0, ans=0.2 2024-08-12 11:34:38,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1615410.0, ans=0.025 2024-08-12 11:34:39,047 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 11:35:01,915 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 11:35:04,449 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2150, loss[loss=0.1241, beats_loss=0.009575, ecapa_loss=0.0001535, whisper_loss=0.113, over 23552.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01108, ecapa_loss=0.000171, whisper_loss=0.08929, over 3856069.97 frames. ], batch size: 89, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:35:09,058 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.17 vs. limit=15.0 2024-08-12 11:35:13,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1615610.0, ans=0.125 2024-08-12 11:35:17,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1615610.0, ans=0.2 2024-08-12 11:35:25,185 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 12 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 11:35:43,651 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.18 vs. limit=15.0 2024-08-12 11:35:50,222 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 11:35:53,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1615910.0, ans=0.0 2024-08-12 11:36:06,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1616010.0, ans=0.125 2024-08-12 11:36:10,630 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-08-12 11:36:19,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1616010.0, ans=0.0 2024-08-12 11:36:23,484 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2200, loss[loss=0.09107, beats_loss=0.01049, ecapa_loss=0.0001965, whisper_loss=0.07861, over 13938.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01106, ecapa_loss=0.0001714, whisper_loss=0.08951, over 3819589.54 frames. ], batch size: 57, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:36:30,185 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 19 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-12 11:36:40,647 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.491e+01 2.779e+01 3.104e+01 1.679e+02, threshold=5.558e+01, percent-clipped=1.0 2024-08-12 11:36:44,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1616210.0, ans=0.0 2024-08-12 11:36:59,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1616310.0, ans=0.0 2024-08-12 11:37:03,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1616310.0, ans=0.125 2024-08-12 11:37:10,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1616310.0, ans=0.1 2024-08-12 11:37:12,418 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.30 vs. limit=15.0 2024-08-12 11:37:13,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1616410.0, ans=0.125 2024-08-12 11:37:15,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1616410.0, ans=0.125 2024-08-12 11:37:16,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1616410.0, ans=0.125 2024-08-12 11:37:32,438 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 11:37:44,841 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2250, loss[loss=0.1126, beats_loss=0.009352, ecapa_loss=0.0002084, whisper_loss=0.1011, over 15102.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01111, ecapa_loss=0.0001722, whisper_loss=0.08973, over 3810624.82 frames. ], batch size: 61, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:37:53,804 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 11:38:07,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1616710.0, ans=0.125 2024-08-12 11:38:07,932 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.99 vs. limit=22.5 2024-08-12 11:38:19,105 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.46 vs. limit=22.5 2024-08-12 11:38:22,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1616810.0, ans=0.0 2024-08-12 11:38:23,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1616810.0, ans=0.1 2024-08-12 11:38:23,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1616810.0, ans=0.2 2024-08-12 11:38:43,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1616910.0, ans=0.0 2024-08-12 11:38:44,809 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 11:38:54,129 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 11:38:56,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1617010.0, ans=0.125 2024-08-12 11:38:57,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1617010.0, ans=0.125 2024-08-12 11:39:01,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1617010.0, ans=0.125 2024-08-12 11:39:05,906 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2300, loss[loss=0.08804, beats_loss=0.01125, ecapa_loss=0.0002012, whisper_loss=0.07477, over 17351.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01104, ecapa_loss=0.0001733, whisper_loss=0.091, over 3825210.44 frames. ], batch size: 71, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:39:22,273 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.578e+01 2.776e+01 3.127e+01 7.036e+01, threshold=5.552e+01, percent-clipped=1.0 2024-08-12 11:39:29,267 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=8.102e-03 2024-08-12 11:39:40,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1617310.0, ans=0.2 2024-08-12 11:39:49,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1617310.0, ans=0.1 2024-08-12 11:39:55,833 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 11:40:10,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1617510.0, ans=0.125 2024-08-12 11:40:17,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1617510.0, ans=0.025 2024-08-12 11:40:25,811 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2350, loss[loss=0.09396, beats_loss=0.0121, ecapa_loss=0.0001616, whisper_loss=0.08025, over 21861.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01094, ecapa_loss=0.0001754, whisper_loss=0.09264, over 3870191.17 frames. ], batch size: 88, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:40:31,028 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-12 11:41:05,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1617810.0, ans=0.1 2024-08-12 11:41:19,208 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-12 11:41:35,353 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 34 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 11:41:37,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1618010.0, ans=0.125 2024-08-12 11:41:39,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1618010.0, ans=0.1 2024-08-12 11:41:47,760 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2400, loss[loss=0.1064, beats_loss=0.01119, ecapa_loss=0.0002041, whisper_loss=0.09319, over 20586.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01093, ecapa_loss=0.0001749, whisper_loss=0.09301, over 3900656.70 frames. ], batch size: 84, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:41:52,632 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-12 11:42:03,081 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.471e+01 2.708e+01 3.082e+01 4.957e+01, threshold=5.416e+01, percent-clipped=0.0 2024-08-12 11:42:11,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1618210.0, ans=0.09899494936611666 2024-08-12 11:42:19,120 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 11:43:00,114 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-12 11:43:06,419 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2450, loss[loss=0.1355, beats_loss=0.008433, ecapa_loss=0.0002181, whisper_loss=0.1249, over 22252.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01084, ecapa_loss=0.0001751, whisper_loss=0.09341, over 3873672.27 frames. ], batch size: 92, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:43:06,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1618610.0, ans=0.125 2024-08-12 11:43:17,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1618610.0, ans=0.1 2024-08-12 11:43:21,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1618610.0, ans=0.2 2024-08-12 11:43:24,846 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2024-08-12 11:43:53,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1618810.0, ans=0.1 2024-08-12 11:44:34,935 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 32 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-12 11:44:35,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1619010.0, ans=0.125 2024-08-12 11:44:35,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1619010.0, ans=0.125 2024-08-12 11:44:49,166 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2500, loss[loss=0.1074, beats_loss=0.01059, ecapa_loss=0.0002045, whisper_loss=0.09473, over 21765.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01072, ecapa_loss=0.0001763, whisper_loss=0.09366, over 3872678.99 frames. ], batch size: 90, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:44:50,263 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 11:44:52,404 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 11:44:59,457 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-12 11:45:10,358 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.514e+01 2.796e+01 3.106e+01 8.282e+01, threshold=5.592e+01, percent-clipped=2.0 2024-08-12 11:45:12,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1619210.0, ans=0.1 2024-08-12 11:45:41,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1619310.0, ans=0.2 2024-08-12 11:45:52,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1619410.0, ans=0.125 2024-08-12 11:45:54,789 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-12 11:46:01,444 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.762e-03 2024-08-12 11:46:07,601 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 11:46:12,250 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 17 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 11:46:13,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1619410.0, ans=0.1 2024-08-12 11:46:39,340 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2550, loss[loss=0.115, beats_loss=0.01287, ecapa_loss=0.0001762, whisper_loss=0.1004, over 22049.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01077, ecapa_loss=0.0001759, whisper_loss=0.09356, over 3879242.18 frames. ], batch size: 89, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:47:15,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1619710.0, ans=0.1 2024-08-12 11:47:20,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1619810.0, ans=0.05 2024-08-12 11:47:40,736 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 11:47:42,964 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=15.0 2024-08-12 11:47:43,782 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 11:47:47,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1619910.0, ans=0.0 2024-08-12 11:48:05,529 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2600, loss[loss=0.1067, beats_loss=0.01161, ecapa_loss=0.0001622, whisper_loss=0.0935, over 23131.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01084, ecapa_loss=0.000176, whisper_loss=0.09265, over 3839803.40 frames. ], batch size: 90, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:48:12,162 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 28 from LS+wenet, 18 from Vox, 14 fro AS 2024-08-12 11:48:13,242 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2024-08-12 11:48:21,138 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.606e+01 2.871e+01 3.471e+01 6.871e+01, threshold=5.743e+01, percent-clipped=3.0 2024-08-12 11:48:23,242 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 11:48:28,742 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 33 from Vox, 24 fro AS 2024-08-12 11:48:40,724 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 17 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 11:49:04,731 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 11:49:24,659 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2650, loss[loss=0.09925, beats_loss=0.01166, ecapa_loss=0.0001982, whisper_loss=0.0856, over 22081.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01086, ecapa_loss=0.0001768, whisper_loss=0.09224, over 3849282.13 frames. ], batch size: 90, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:49:35,429 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 11:49:37,136 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 11:49:39,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1620710.0, ans=0.0 2024-08-12 11:49:40,544 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 11:49:58,591 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2024-08-12 11:50:11,385 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.07 vs. limit=15.0 2024-08-12 11:50:21,225 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 11:50:22,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1620910.0, ans=0.1 2024-08-12 11:50:25,547 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-12 11:50:28,644 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 31 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 11:50:28,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1621010.0, ans=0.0 2024-08-12 11:50:42,303 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2700, loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001608, whisper_loss=0.09082, over 23015.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01098, ecapa_loss=0.0001755, whisper_loss=0.09205, over 3887515.60 frames. ], batch size: 93, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:50:42,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1621110.0, ans=0.2 2024-08-12 11:50:50,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1621110.0, ans=0.0 2024-08-12 11:50:58,973 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.509e+01 2.801e+01 3.158e+01 4.809e+01, threshold=5.602e+01, percent-clipped=0.0 2024-08-12 11:51:02,652 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.39 vs. limit=22.5 2024-08-12 11:51:09,921 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 11:51:18,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1621310.0, ans=0.2 2024-08-12 11:51:27,593 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 11:51:44,748 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.95 vs. limit=22.5 2024-08-12 11:51:53,411 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 11:52:00,789 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.58 vs. limit=15.0 2024-08-12 11:52:02,792 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2750, loss[loss=0.1116, beats_loss=0.007858, ecapa_loss=0.0002207, whisper_loss=0.1015, over 17902.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01092, ecapa_loss=0.0001771, whisper_loss=0.09187, over 3849679.93 frames. ], batch size: 71, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:52:10,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1621610.0, ans=0.1 2024-08-12 11:52:27,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1621710.0, ans=0.0 2024-08-12 11:52:33,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1621810.0, ans=0.1 2024-08-12 11:52:48,126 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 11:53:22,144 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2800, loss[loss=0.1092, beats_loss=0.01119, ecapa_loss=0.0001429, whisper_loss=0.09662, over 23163.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01095, ecapa_loss=0.0001761, whisper_loss=0.09141, over 3866923.49 frames. ], batch size: 90, lr: 5.33e-03, grad_scale: 1.152921504606847e+18 2024-08-12 11:53:31,248 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.57 vs. limit=15.0 2024-08-12 11:53:37,679 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.464e+01 2.680e+01 3.068e+01 4.016e+01, threshold=5.359e+01, percent-clipped=0.0 2024-08-12 11:53:56,311 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 11:54:09,405 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 13 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-12 11:54:11,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1622410.0, ans=0.0 2024-08-12 11:54:24,290 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 11:54:36,135 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 11:54:36,749 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-12 11:54:38,195 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 11:54:42,097 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 38 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 11:54:43,474 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2850, loss[loss=0.131, beats_loss=0.008666, ecapa_loss=0.0001683, whisper_loss=0.1206, over 23551.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.011, ecapa_loss=0.0001753, whisper_loss=0.09213, over 3879623.56 frames. ], batch size: 91, lr: 5.32e-03, grad_scale: 1.152921504606847e+18 2024-08-12 11:54:49,868 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 11:55:16,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1622810.0, ans=0.0 2024-08-12 11:55:56,684 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2024-08-12 11:56:00,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1623010.0, ans=15.0 2024-08-12 11:56:05,001 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2900, loss[loss=0.09609, beats_loss=0.01146, ecapa_loss=0.0001559, whisper_loss=0.08307, over 20753.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01102, ecapa_loss=0.0001758, whisper_loss=0.09237, over 3886532.46 frames. ], batch size: 82, lr: 5.32e-03, grad_scale: 1.152921504606847e+18 2024-08-12 11:56:07,821 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 11:56:18,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1623110.0, ans=0.125 2024-08-12 11:56:20,727 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.518e+01 2.817e+01 3.035e+01 4.423e+01, threshold=5.633e+01, percent-clipped=0.0 2024-08-12 11:56:38,535 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.74 vs. limit=22.5 2024-08-12 11:57:01,682 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 11:57:10,061 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.92 vs. limit=15.0 2024-08-12 11:57:11,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1623510.0, ans=0.125 2024-08-12 11:57:13,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1623510.0, ans=0.05 2024-08-12 11:57:25,072 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 2950, loss[loss=0.1005, beats_loss=0.01176, ecapa_loss=0.000216, whisper_loss=0.08658, over 18054.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01099, ecapa_loss=0.0001775, whisper_loss=0.09225, over 3866818.97 frames. ], batch size: 79, lr: 5.32e-03, grad_scale: 1.152921504606847e+18 2024-08-12 11:57:27,992 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2024-08-12 11:57:53,183 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-12 11:57:54,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1623710.0, ans=0.125 2024-08-12 11:58:07,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1623810.0, ans=0.1 2024-08-12 11:58:22,838 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-12 11:58:25,594 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 11:58:40,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1624010.0, ans=0.125 2024-08-12 11:58:44,663 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3000, loss[loss=0.1204, beats_loss=0.01093, ecapa_loss=0.0001623, whisper_loss=0.1079, over 19484.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01104, ecapa_loss=0.0001773, whisper_loss=0.09193, over 3868741.16 frames. ], batch size: 77, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:58:44,663 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 11:59:25,751 INFO [train_multi_KD3.py:1149] (2/4) Epoch 12, validation on ASR_libri: loss=0.256, beats_loss=0, ecapa_loss=0.0005941, whisper_loss=0.2501, over 922467.00 frames. 2024-08-12 11:59:45,020 INFO [train_multi_KD3.py:1149] (2/4) Epoch 12, validation on SV_voxceleb1: loss=0.00471, beats_loss=0, ecapa_loss=0.000471, whisper_loss=0, over 939242.00 frames. 2024-08-12 12:01:31,279 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.9907, 4.8207, 4.2193, 4.5740], device='cuda:2') 2024-08-12 12:01:46,966 INFO [train_multi_KD3.py:1149] (2/4) Epoch 12, validation on AT_audioset: loss=0.02429, beats_loss=0.02429, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 12:01:46,971 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 12:02:00,116 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 12:02:01,998 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-12 12:02:03,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1624210.0, ans=0.0 2024-08-12 12:02:03,788 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.553e+01 2.970e+01 3.483e+01 4.771e+01, threshold=5.939e+01, percent-clipped=0.0 2024-08-12 12:02:07,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1624210.0, ans=0.0 2024-08-12 12:02:12,883 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=15.0 2024-08-12 12:02:36,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1624410.0, ans=0.07 2024-08-12 12:02:46,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1624410.0, ans=0.125 2024-08-12 12:03:05,073 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3050, loss[loss=0.1056, beats_loss=0.01282, ecapa_loss=0.0001547, whisper_loss=0.09128, over 22647.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01106, ecapa_loss=0.0001763, whisper_loss=0.09217, over 3901163.34 frames. ], batch size: 90, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:03:06,956 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-12 12:03:07,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1624610.0, ans=0.125 2024-08-12 12:03:08,283 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 12:03:17,477 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2024-08-12 12:03:27,371 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.29 vs. limit=10.0 2024-08-12 12:03:29,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1624710.0, ans=0.0 2024-08-12 12:03:52,234 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2024-08-12 12:03:57,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1624910.0, ans=0.0 2024-08-12 12:04:25,251 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3100, loss[loss=0.09346, beats_loss=0.01078, ecapa_loss=0.0002213, whisper_loss=0.08046, over 15057.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01111, ecapa_loss=0.0001757, whisper_loss=0.09206, over 3864114.70 frames. ], batch size: 63, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:04:43,008 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.511e+01 2.833e+01 3.211e+01 6.314e+01, threshold=5.667e+01, percent-clipped=1.0 2024-08-12 12:05:14,241 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-12 12:05:19,623 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-12 12:05:29,934 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 13 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 12:05:33,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1625510.0, ans=0.125 2024-08-12 12:05:37,616 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.71 vs. limit=22.5 2024-08-12 12:05:41,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.70 vs. limit=15.0 2024-08-12 12:05:42,634 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 12:05:44,000 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3150, loss[loss=0.1057, beats_loss=0.01017, ecapa_loss=0.0001677, whisper_loss=0.09386, over 19257.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01108, ecapa_loss=0.0001757, whisper_loss=0.09201, over 3862286.86 frames. ], batch size: 76, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:05:49,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1625610.0, ans=0.2 2024-08-12 12:06:00,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1625710.0, ans=0.125 2024-08-12 12:06:51,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1626010.0, ans=0.125 2024-08-12 12:07:01,020 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-08-12 12:07:03,480 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3200, loss[loss=0.06811, beats_loss=0.01557, ecapa_loss=0.0002133, whisper_loss=0.05041, over 16066.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01107, ecapa_loss=0.0001762, whisper_loss=0.09196, over 3847110.57 frames. ], batch size: 71, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:07:03,700 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 12:07:03,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1626110.0, ans=0.125 2024-08-12 12:07:07,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1626110.0, ans=0.125 2024-08-12 12:07:19,540 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 12:07:21,131 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.397e+01 2.776e+01 3.062e+01 4.690e+01, threshold=5.551e+01, percent-clipped=0.0 2024-08-12 12:07:23,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1626210.0, ans=0.1 2024-08-12 12:07:24,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1626210.0, ans=0.1 2024-08-12 12:07:30,857 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 12:07:34,162 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-08-12 12:07:44,077 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 12:07:47,821 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 32 from Vox, 28 fro AS 2024-08-12 12:07:52,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1626410.0, ans=0.1 2024-08-12 12:07:57,184 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.34 vs. limit=15.0 2024-08-12 12:08:21,800 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 12:08:22,507 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2024-08-12 12:08:22,876 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3250, loss[loss=0.1248, beats_loss=0.0119, ecapa_loss=0.000156, whisper_loss=0.1113, over 23032.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01108, ecapa_loss=0.0001758, whisper_loss=0.09219, over 3878443.86 frames. ], batch size: 90, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:08:35,305 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 18 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-12 12:08:40,226 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-12 12:08:42,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1626710.0, ans=0.0 2024-08-12 12:08:58,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1626810.0, ans=0.125 2024-08-12 12:08:59,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1626810.0, ans=0.07 2024-08-12 12:09:10,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1626910.0, ans=0.0 2024-08-12 12:09:12,056 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 12:09:19,579 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.60 vs. limit=15.0 2024-08-12 12:09:21,377 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.03 vs. limit=10.0 2024-08-12 12:09:32,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1627010.0, ans=0.07 2024-08-12 12:09:39,322 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 24 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-12 12:09:41,745 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2024-08-12 12:09:42,092 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3300, loss[loss=0.1164, beats_loss=0.01081, ecapa_loss=0.0001515, whisper_loss=0.1041, over 23012.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01097, ecapa_loss=0.0001778, whisper_loss=0.09287, over 3919262.77 frames. ], batch size: 88, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:09:42,868 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 12:09:52,594 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2024-08-12 12:09:58,680 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.622e+01 3.065e+01 3.686e+01 1.090e+02, threshold=6.129e+01, percent-clipped=1.0 2024-08-12 12:10:08,159 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 12:10:20,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1627310.0, ans=0.125 2024-08-12 12:10:28,782 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-08-12 12:10:29,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1627410.0, ans=0.125 2024-08-12 12:10:31,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1627410.0, ans=0.125 2024-08-12 12:10:57,314 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=22.5 2024-08-12 12:10:59,341 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3350, loss[loss=0.1304, beats_loss=0.009708, ecapa_loss=0.0002283, whisper_loss=0.1184, over 15473.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01096, ecapa_loss=0.0001775, whisper_loss=0.09296, over 3897632.02 frames. ], batch size: 63, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:11:35,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1627810.0, ans=0.125 2024-08-12 12:11:40,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1627810.0, ans=0.2 2024-08-12 12:11:58,354 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 12:12:02,761 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2024-08-12 12:12:17,491 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3400, loss[loss=0.09312, beats_loss=0.0128, ecapa_loss=0.0001404, whisper_loss=0.07892, over 16397.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.011, ecapa_loss=0.0001761, whisper_loss=0.09253, over 3892492.46 frames. ], batch size: 66, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:12:18,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1628110.0, ans=0.2 2024-08-12 12:12:25,929 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-12 12:12:34,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1628210.0, ans=0.0 2024-08-12 12:12:35,660 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.455e+01 2.782e+01 3.017e+01 1.106e+02, threshold=5.563e+01, percent-clipped=1.0 2024-08-12 12:12:40,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1628210.0, ans=0.035 2024-08-12 12:12:58,933 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 12:13:13,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1628410.0, ans=0.125 2024-08-12 12:13:21,181 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 12:13:22,728 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 12:13:28,763 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-12 12:13:36,113 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3450, loss[loss=0.1015, beats_loss=0.01129, ecapa_loss=0.0001526, whisper_loss=0.08871, over 22604.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01094, ecapa_loss=0.0001768, whisper_loss=0.09282, over 3907295.54 frames. ], batch size: 89, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:13:41,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1628610.0, ans=0.09899494936611666 2024-08-12 12:13:41,759 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=12.0 2024-08-12 12:14:18,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1628810.0, ans=0.125 2024-08-12 12:14:28,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1628910.0, ans=0.0 2024-08-12 12:14:29,274 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2024-08-12 12:14:47,645 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 18 from LS+wenet, 34 from Vox, 40 fro AS 2024-08-12 12:14:50,098 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2024-08-12 12:14:53,507 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3500, loss[loss=0.1097, beats_loss=0.008248, ecapa_loss=0.0002179, whisper_loss=0.09922, over 13787.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01095, ecapa_loss=0.0001769, whisper_loss=0.0928, over 3862281.41 frames. ], batch size: 56, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:15:08,167 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 12:15:08,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1629210.0, ans=0.125 2024-08-12 12:15:10,380 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.037e+01 2.491e+01 2.788e+01 3.215e+01 5.809e+01, threshold=5.577e+01, percent-clipped=2.0 2024-08-12 12:15:14,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1629210.0, ans=0.0 2024-08-12 12:15:32,044 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.55 vs. limit=10.0 2024-08-12 12:15:53,119 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 23 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-12 12:16:03,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1629510.0, ans=0.0 2024-08-12 12:16:12,205 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3550, loss[loss=0.07647, beats_loss=0.01472, ecapa_loss=0.0001769, whisper_loss=0.05998, over 18750.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01094, ecapa_loss=0.0001766, whisper_loss=0.09234, over 3864405.20 frames. ], batch size: 78, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:16:17,032 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 32 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 12:16:23,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1629610.0, ans=0.0 2024-08-12 12:16:32,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1629710.0, ans=0.125 2024-08-12 12:16:51,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1629810.0, ans=0.125 2024-08-12 12:17:06,972 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 27 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 12:17:18,431 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 12:17:20,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1630010.0, ans=0.1 2024-08-12 12:17:20,628 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2024-08-12 12:17:23,186 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 12:17:28,984 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3600, loss[loss=0.09727, beats_loss=0.01127, ecapa_loss=0.0001855, whisper_loss=0.08415, over 13857.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.011, ecapa_loss=0.0001762, whisper_loss=0.09235, over 3887535.70 frames. ], batch size: 55, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:17:32,034 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 12:17:36,584 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 12:17:45,453 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.089e+01 2.537e+01 2.866e+01 3.271e+01 6.335e+01, threshold=5.732e+01, percent-clipped=1.0 2024-08-12 12:17:47,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1630210.0, ans=0.0 2024-08-12 12:17:56,325 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-12 12:18:08,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1630310.0, ans=0.125 2024-08-12 12:18:08,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1630310.0, ans=0.125 2024-08-12 12:18:10,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1630310.0, ans=0.0 2024-08-12 12:18:13,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1630410.0, ans=0.2 2024-08-12 12:18:22,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1630410.0, ans=0.125 2024-08-12 12:18:46,542 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3650, loss[loss=0.1075, beats_loss=0.009188, ecapa_loss=0.0001761, whisper_loss=0.09654, over 17159.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01098, ecapa_loss=0.0001769, whisper_loss=0.09211, over 3893197.96 frames. ], batch size: 66, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:18:47,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1630610.0, ans=0.0 2024-08-12 12:18:53,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1630610.0, ans=0.1 2024-08-12 12:18:57,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1630610.0, ans=0.125 2024-08-12 12:19:12,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1630710.0, ans=0.1 2024-08-12 12:19:24,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1630810.0, ans=0.0 2024-08-12 12:19:27,097 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 12:19:52,531 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.55 vs. limit=10.0 2024-08-12 12:20:01,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1631010.0, ans=0.125 2024-08-12 12:20:05,293 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3700, loss[loss=0.1055, beats_loss=0.01158, ecapa_loss=0.0001697, whisper_loss=0.09218, over 23078.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01097, ecapa_loss=0.0001769, whisper_loss=0.09272, over 3877842.55 frames. ], batch size: 91, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:20:17,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1631110.0, ans=0.125 2024-08-12 12:20:23,217 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.690e+01 3.090e+01 3.461e+01 6.737e+01, threshold=6.180e+01, percent-clipped=1.0 2024-08-12 12:20:30,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1631210.0, ans=0.0 2024-08-12 12:20:34,425 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 12:20:40,230 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 12:20:42,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1631310.0, ans=0.0 2024-08-12 12:20:51,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1631310.0, ans=0.0 2024-08-12 12:20:53,561 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 12:21:06,474 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-12 12:21:17,109 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 12:21:17,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1631510.0, ans=0.125 2024-08-12 12:21:24,866 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3750, loss[loss=0.107, beats_loss=0.01106, ecapa_loss=0.0001637, whisper_loss=0.09431, over 21094.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01099, ecapa_loss=0.000176, whisper_loss=0.09293, over 3874326.13 frames. ], batch size: 83, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:21:42,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1631710.0, ans=0.125 2024-08-12 12:21:46,349 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.72 vs. limit=22.5 2024-08-12 12:21:51,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1631710.0, ans=0.125 2024-08-12 12:21:53,660 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 12:22:03,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=1631810.0, ans=10.0 2024-08-12 12:22:12,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1631910.0, ans=0.125 2024-08-12 12:22:13,955 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-08-12 12:22:23,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=1631910.0, ans=22.5 2024-08-12 12:22:44,848 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3800, loss[loss=0.1072, beats_loss=0.01116, ecapa_loss=0.0001522, whisper_loss=0.09449, over 19753.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01105, ecapa_loss=0.0001771, whisper_loss=0.09221, over 3911452.39 frames. ], batch size: 77, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:23:02,421 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.541e+01 2.857e+01 3.346e+01 7.613e+01, threshold=5.713e+01, percent-clipped=1.0 2024-08-12 12:23:07,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1632210.0, ans=0.125 2024-08-12 12:23:07,637 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-08-12 12:23:08,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1632210.0, ans=0.1 2024-08-12 12:23:31,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1632410.0, ans=0.05 2024-08-12 12:23:42,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1632410.0, ans=0.125 2024-08-12 12:24:01,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1632610.0, ans=0.09899494936611666 2024-08-12 12:24:01,998 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3850, loss[loss=0.09823, beats_loss=0.01305, ecapa_loss=0.0002047, whisper_loss=0.08313, over 13849.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01108, ecapa_loss=0.0001781, whisper_loss=0.0921, over 3881662.74 frames. ], batch size: 58, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:24:14,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1632610.0, ans=0.1 2024-08-12 12:24:22,726 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-12 12:24:22,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1632710.0, ans=0.2 2024-08-12 12:24:27,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1632710.0, ans=0.125 2024-08-12 12:24:43,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=1632810.0, ans=0.05 2024-08-12 12:24:49,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1632910.0, ans=0.2 2024-08-12 12:24:54,040 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2024-08-12 12:24:57,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1632910.0, ans=0.125 2024-08-12 12:25:17,687 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 12:25:20,739 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-12 12:25:22,034 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3900, loss[loss=0.09932, beats_loss=0.008959, ecapa_loss=0.0002262, whisper_loss=0.0881, over 12907.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01105, ecapa_loss=0.0001776, whisper_loss=0.09279, over 3899329.63 frames. ], batch size: 55, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:25:25,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1633110.0, ans=0.125 2024-08-12 12:25:39,263 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.512e+01 2.803e+01 3.159e+01 7.102e+01, threshold=5.607e+01, percent-clipped=1.0 2024-08-12 12:25:40,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1633210.0, ans=0.2 2024-08-12 12:25:46,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1633210.0, ans=0.1 2024-08-12 12:25:54,068 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 12:25:54,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1633310.0, ans=0.1 2024-08-12 12:26:07,136 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-12 12:26:17,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1633410.0, ans=0.0 2024-08-12 12:26:41,637 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 3950, loss[loss=0.07678, beats_loss=0.01077, ecapa_loss=0.0001827, whisper_loss=0.06418, over 17448.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01101, ecapa_loss=0.0001775, whisper_loss=0.09296, over 3894061.68 frames. ], batch size: 71, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:26:43,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1633610.0, ans=0.0 2024-08-12 12:26:47,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1633610.0, ans=0.125 2024-08-12 12:26:54,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1633610.0, ans=0.125 2024-08-12 12:26:54,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1633610.0, ans=0.04949747468305833 2024-08-12 12:26:55,044 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 12:27:02,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1633710.0, ans=0.0 2024-08-12 12:27:37,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1633910.0, ans=0.125 2024-08-12 12:27:42,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1633910.0, ans=0.0 2024-08-12 12:27:46,358 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 12:27:51,695 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2024-08-12 12:28:00,441 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4000, loss[loss=0.09932, beats_loss=0.01033, ecapa_loss=0.0001435, whisper_loss=0.08755, over 16897.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01101, ecapa_loss=0.0001774, whisper_loss=0.0924, over 3869914.82 frames. ], batch size: 65, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:28:00,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1634110.0, ans=0.125 2024-08-12 12:28:12,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1634110.0, ans=0.0 2024-08-12 12:28:15,622 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-12 12:28:16,625 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.585e+01 2.882e+01 3.381e+01 6.617e+01, threshold=5.764e+01, percent-clipped=3.0 2024-08-12 12:28:21,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1634210.0, ans=0.0 2024-08-12 12:29:07,513 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 12:29:14,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1634510.0, ans=0.125 2024-08-12 12:29:16,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1634510.0, ans=0.2 2024-08-12 12:29:16,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1634510.0, ans=10.0 2024-08-12 12:29:18,880 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4050, loss[loss=0.1148, beats_loss=0.009169, ecapa_loss=0.0001728, whisper_loss=0.1039, over 18784.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01096, ecapa_loss=0.0001771, whisper_loss=0.09236, over 3842454.36 frames. ], batch size: 70, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:29:33,275 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 18 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 12:29:33,818 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=15.0 2024-08-12 12:29:52,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2024-08-12 12:29:58,385 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 12:30:12,498 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-12 12:30:22,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1635010.0, ans=0.0 2024-08-12 12:30:29,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1635010.0, ans=0.125 2024-08-12 12:30:39,527 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4100, loss[loss=0.1309, beats_loss=0.007953, ecapa_loss=0.00017, whisper_loss=0.1212, over 19769.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0111, ecapa_loss=0.0001761, whisper_loss=0.09166, over 3875773.40 frames. ], batch size: 74, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:30:47,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1635110.0, ans=0.125 2024-08-12 12:30:49,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1635110.0, ans=0.125 2024-08-12 12:30:56,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.058e+01 2.490e+01 2.729e+01 3.052e+01 9.662e+01, threshold=5.458e+01, percent-clipped=1.0 2024-08-12 12:31:09,201 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2024-08-12 12:31:43,976 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 12:31:45,585 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 12:32:00,366 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4150, loss[loss=0.1066, beats_loss=0.01075, ecapa_loss=0.0002052, whisper_loss=0.09375, over 18787.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01112, ecapa_loss=0.0001767, whisper_loss=0.0918, over 3881246.78 frames. ], batch size: 76, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:32:14,949 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 12:32:34,699 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 12:32:45,053 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=12.0 2024-08-12 12:32:47,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1635910.0, ans=0.04949747468305833 2024-08-12 12:33:07,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1636010.0, ans=0.1 2024-08-12 12:33:12,093 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 30 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 12:33:20,529 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4200, loss[loss=0.0878, beats_loss=0.01369, ecapa_loss=0.0001199, whisper_loss=0.07292, over 17526.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0111, ecapa_loss=0.0001768, whisper_loss=0.0916, over 3892245.00 frames. ], batch size: 69, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:33:22,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1636110.0, ans=0.0 2024-08-12 12:33:25,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1636110.0, ans=0.125 2024-08-12 12:33:37,615 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.467e+01 2.734e+01 3.043e+01 4.289e+01, threshold=5.468e+01, percent-clipped=0.0 2024-08-12 12:33:49,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1636210.0, ans=0.1 2024-08-12 12:33:49,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1636210.0, ans=0.0 2024-08-12 12:33:51,869 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-12 12:33:55,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1636310.0, ans=0.1 2024-08-12 12:33:58,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1636310.0, ans=0.125 2024-08-12 12:34:09,279 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 12:34:19,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1636410.0, ans=0.1 2024-08-12 12:34:39,198 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4250, loss[loss=0.1055, beats_loss=0.01093, ecapa_loss=0.0001518, whisper_loss=0.09303, over 20135.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01104, ecapa_loss=0.0001764, whisper_loss=0.09161, over 3889272.68 frames. ], batch size: 77, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:34:40,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1636610.0, ans=0.125 2024-08-12 12:34:44,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1636610.0, ans=0.125 2024-08-12 12:34:56,525 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 12:34:58,199 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 12:35:02,527 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-12 12:35:26,093 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 12:35:32,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1636910.0, ans=0.125 2024-08-12 12:35:40,436 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=12.0 2024-08-12 12:35:58,567 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4300, loss[loss=0.1043, beats_loss=0.01082, ecapa_loss=0.0002058, whisper_loss=0.09137, over 13385.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01097, ecapa_loss=0.000177, whisper_loss=0.09131, over 3844377.68 frames. ], batch size: 55, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:36:01,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1637110.0, ans=0.125 2024-08-12 12:36:15,247 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.507e+01 2.747e+01 3.144e+01 4.891e+01, threshold=5.494e+01, percent-clipped=0.0 2024-08-12 12:36:22,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1637210.0, ans=0.1 2024-08-12 12:36:24,056 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 12:36:28,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1637310.0, ans=0.2 2024-08-12 12:36:43,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1637310.0, ans=6.0 2024-08-12 12:36:44,279 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-12 12:36:46,473 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.37 vs. limit=15.0 2024-08-12 12:36:47,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1637410.0, ans=0.0 2024-08-12 12:36:48,462 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 12:36:48,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1637410.0, ans=0.1 2024-08-12 12:37:14,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1637510.0, ans=0.1 2024-08-12 12:37:16,931 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4350, loss[loss=0.1048, beats_loss=0.0125, ecapa_loss=0.0001507, whisper_loss=0.0908, over 23340.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01095, ecapa_loss=0.0001762, whisper_loss=0.09153, over 3855600.48 frames. ], batch size: 93, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:37:34,333 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 12:37:54,311 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 19 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-12 12:38:02,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1637810.0, ans=0.125 2024-08-12 12:38:04,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1637910.0, ans=0.125 2024-08-12 12:38:24,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1638010.0, ans=0.125 2024-08-12 12:38:24,865 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.08 vs. limit=22.5 2024-08-12 12:38:28,484 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=12.0 2024-08-12 12:38:36,750 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4400, loss[loss=0.1003, beats_loss=0.01254, ecapa_loss=0.0001513, whisper_loss=0.08624, over 15083.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01095, ecapa_loss=0.0001761, whisper_loss=0.09201, over 3871808.50 frames. ], batch size: 59, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:38:44,836 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=12.0 2024-08-12 12:38:55,047 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.518e+01 2.794e+01 3.242e+01 9.315e+01, threshold=5.589e+01, percent-clipped=2.0 2024-08-12 12:38:58,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1638210.0, ans=0.125 2024-08-12 12:39:13,724 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 26 from LS+wenet, 16 from Vox, 15 fro AS 2024-08-12 12:39:20,481 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 12:39:26,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1638410.0, ans=0.0 2024-08-12 12:39:27,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1638410.0, ans=0.125 2024-08-12 12:39:28,956 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 12:39:34,059 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-12 12:39:59,154 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4450, loss[loss=0.1052, beats_loss=0.0112, ecapa_loss=0.0001602, whisper_loss=0.09245, over 14350.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.011, ecapa_loss=0.000175, whisper_loss=0.09177, over 3899558.56 frames. ], batch size: 56, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:39:59,348 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 31 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 12:40:07,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1638610.0, ans=0.125 2024-08-12 12:40:30,186 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-12 12:41:04,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1639010.0, ans=0.125 2024-08-12 12:41:08,939 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 12:41:16,188 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.60 vs. limit=10.0 2024-08-12 12:41:16,280 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.01 vs. limit=10.0 2024-08-12 12:41:17,679 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2024-08-12 12:41:18,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1639110.0, ans=0.1 2024-08-12 12:41:19,611 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4500, loss[loss=0.1065, beats_loss=0.01102, ecapa_loss=0.0001665, whisper_loss=0.09385, over 17862.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0111, ecapa_loss=0.0001753, whisper_loss=0.09145, over 3883132.72 frames. ], batch size: 68, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:41:37,358 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.496e+01 3.007e+01 3.529e+01 6.889e+01, threshold=6.014e+01, percent-clipped=3.0 2024-08-12 12:41:48,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1639210.0, ans=0.0 2024-08-12 12:42:16,064 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-12 12:42:23,666 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 12:42:38,274 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4550, loss[loss=0.09699, beats_loss=0.01114, ecapa_loss=0.0001861, whisper_loss=0.08399, over 21697.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01107, ecapa_loss=0.0001756, whisper_loss=0.09193, over 3901797.39 frames. ], batch size: 92, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:42:41,905 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 12:42:46,238 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 17 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 12:43:18,222 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 28 from LS+wenet, 22 from Vox, 16 fro AS 2024-08-12 12:43:24,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1639910.0, ans=0.0 2024-08-12 12:43:43,602 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.22 vs. limit=10.0 2024-08-12 12:43:45,598 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 12:43:57,992 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4600, loss[loss=0.106, beats_loss=0.01144, ecapa_loss=0.0002014, whisper_loss=0.09251, over 21388.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01104, ecapa_loss=0.0001757, whisper_loss=0.0918, over 3907395.45 frames. ], batch size: 93, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:43:58,932 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2024-08-12 12:44:02,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1640110.0, ans=0.0 2024-08-12 12:44:09,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1640110.0, ans=0.1 2024-08-12 12:44:14,554 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.449e+01 2.715e+01 3.086e+01 6.580e+01, threshold=5.431e+01, percent-clipped=1.0 2024-08-12 12:44:17,439 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2024-08-12 12:44:29,586 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 12:44:59,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1640410.0, ans=0.025 2024-08-12 12:45:04,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1640510.0, ans=0.1 2024-08-12 12:45:09,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1640510.0, ans=0.0 2024-08-12 12:45:12,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1640510.0, ans=0.125 2024-08-12 12:45:16,642 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4650, loss[loss=0.1145, beats_loss=0.01076, ecapa_loss=0.0001832, whisper_loss=0.1019, over 22288.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01116, ecapa_loss=0.000175, whisper_loss=0.09096, over 3906005.08 frames. ], batch size: 92, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:45:26,621 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.279e+05 2024-08-12 12:45:42,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1640710.0, ans=0.1 2024-08-12 12:45:42,968 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.33 vs. limit=10.0 2024-08-12 12:45:48,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1640810.0, ans=0.09899494936611666 2024-08-12 12:45:53,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1640810.0, ans=0.125 2024-08-12 12:46:21,590 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.439e-01 2024-08-12 12:46:36,286 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4700, loss[loss=0.09877, beats_loss=0.01028, ecapa_loss=0.0002003, whisper_loss=0.08649, over 16995.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01106, ecapa_loss=0.0001757, whisper_loss=0.09156, over 3902555.59 frames. ], batch size: 69, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:46:47,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1641110.0, ans=0.125 2024-08-12 12:46:52,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1641210.0, ans=0.0 2024-08-12 12:46:54,756 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.509e+01 2.776e+01 3.112e+01 6.525e+01, threshold=5.552e+01, percent-clipped=1.0 2024-08-12 12:47:03,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1641210.0, ans=0.1 2024-08-12 12:47:07,696 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 12:47:13,845 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.775e+00 2024-08-12 12:47:24,886 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 12:47:28,269 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2024-08-12 12:47:30,790 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 12:47:41,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1641510.0, ans=0.125 2024-08-12 12:47:55,002 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4750, loss[loss=0.08348, beats_loss=0.01219, ecapa_loss=0.0001643, whisper_loss=0.06964, over 18289.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.011, ecapa_loss=0.000175, whisper_loss=0.09195, over 3875074.13 frames. ], batch size: 75, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:48:00,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1641610.0, ans=0.025 2024-08-12 12:48:18,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1641710.0, ans=0.125 2024-08-12 12:48:25,878 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 12:48:31,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1641810.0, ans=0.125 2024-08-12 12:48:40,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1641910.0, ans=0.07 2024-08-12 12:49:00,666 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 33 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 12:49:10,821 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4800, loss[loss=0.09592, beats_loss=0.01165, ecapa_loss=0.0001656, whisper_loss=0.08262, over 20881.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01106, ecapa_loss=0.0001758, whisper_loss=0.09146, over 3868262.13 frames. ], batch size: 79, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:49:12,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1642110.0, ans=0.125 2024-08-12 12:49:18,633 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 34 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 12:49:28,546 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.489e+01 2.813e+01 3.178e+01 7.863e+01, threshold=5.627e+01, percent-clipped=2.0 2024-08-12 12:49:35,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1642210.0, ans=0.0 2024-08-12 12:49:39,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1642210.0, ans=0.05 2024-08-12 12:49:39,252 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.09 vs. limit=10.0 2024-08-12 12:49:40,323 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.551e+00 2024-08-12 12:49:41,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1642310.0, ans=0.0 2024-08-12 12:49:45,944 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 15 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 12:50:00,531 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.102e+00 2024-08-12 12:50:28,283 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4850, loss[loss=0.09128, beats_loss=0.009836, ecapa_loss=0.0001743, whisper_loss=0.0797, over 13923.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01107, ecapa_loss=0.0001762, whisper_loss=0.09198, over 3884684.49 frames. ], batch size: 55, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:50:38,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1642610.0, ans=0.1 2024-08-12 12:50:44,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1642710.0, ans=0.125 2024-08-12 12:51:21,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1642910.0, ans=0.125 2024-08-12 12:51:23,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=15.0 2024-08-12 12:51:37,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1643010.0, ans=0.125 2024-08-12 12:51:43,063 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 26 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-12 12:51:47,249 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4900, loss[loss=0.1138, beats_loss=0.01169, ecapa_loss=0.0001751, whisper_loss=0.1004, over 22505.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01106, ecapa_loss=0.0001765, whisper_loss=0.09228, over 3885505.76 frames. ], batch size: 91, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:51:51,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1643110.0, ans=0.125 2024-08-12 12:52:01,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1643210.0, ans=0.125 2024-08-12 12:52:03,728 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.565e+01 2.777e+01 3.230e+01 5.434e+01, threshold=5.553e+01, percent-clipped=0.0 2024-08-12 12:52:19,526 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.56 vs. limit=15.0 2024-08-12 12:52:32,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1643410.0, ans=0.0 2024-08-12 12:52:33,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1643410.0, ans=0.125 2024-08-12 12:52:38,474 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 12:52:43,422 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.08 vs. limit=22.5 2024-08-12 12:52:50,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1643510.0, ans=0.07 2024-08-12 12:52:57,007 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 12:53:02,774 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 4950, loss[loss=0.105, beats_loss=0.01132, ecapa_loss=0.0001539, whisper_loss=0.09212, over 14579.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01112, ecapa_loss=0.0001762, whisper_loss=0.09146, over 3867154.51 frames. ], batch size: 56, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:53:12,507 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 12:53:15,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1643610.0, ans=0.125 2024-08-12 12:53:30,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1643710.0, ans=0.125 2024-08-12 12:54:18,438 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.91 vs. limit=10.0 2024-08-12 12:54:19,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1644110.0, ans=0.0 2024-08-12 12:54:20,315 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5000, loss[loss=0.1187, beats_loss=0.01029, ecapa_loss=0.0001979, whisper_loss=0.1064, over 13943.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01108, ecapa_loss=0.0001765, whisper_loss=0.09246, over 3844886.50 frames. ], batch size: 55, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:54:21,229 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2024-08-12 12:54:24,820 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 21 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-12 12:54:34,226 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 12:54:36,908 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.385e+01 2.734e+01 3.105e+01 6.733e+01, threshold=5.467e+01, percent-clipped=3.0 2024-08-12 12:54:40,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1644210.0, ans=0.1 2024-08-12 12:54:43,965 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 32 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 12:54:49,029 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2024-08-12 12:55:02,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1644310.0, ans=0.0 2024-08-12 12:55:08,006 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 12:55:14,280 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-12 12:55:17,345 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 12:55:20,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1644510.0, ans=0.2 2024-08-12 12:55:37,809 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5050, loss[loss=0.08913, beats_loss=0.01139, ecapa_loss=0.0001973, whisper_loss=0.07577, over 16717.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01106, ecapa_loss=0.0001771, whisper_loss=0.09317, over 3874126.16 frames. ], batch size: 73, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:55:38,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1644610.0, ans=0.0 2024-08-12 12:55:43,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1644610.0, ans=0.07 2024-08-12 12:55:58,457 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 23 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-12 12:56:06,654 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 12:56:32,381 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-12 12:56:39,096 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 12:56:55,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1645110.0, ans=0.04949747468305833 2024-08-12 12:56:56,006 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5100, loss[loss=0.107, beats_loss=0.01215, ecapa_loss=0.0002036, whisper_loss=0.09279, over 18699.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01108, ecapa_loss=0.0001768, whisper_loss=0.09273, over 3862494.53 frames. ], batch size: 77, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:56:56,410 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 12:57:04,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1645110.0, ans=0.1 2024-08-12 12:57:05,673 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 12:57:13,066 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.597e+01 2.875e+01 3.428e+01 8.355e+01, threshold=5.751e+01, percent-clipped=1.0 2024-08-12 12:57:34,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1645310.0, ans=0.0 2024-08-12 12:57:35,705 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 12:57:46,400 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 12:57:54,309 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.370e-01 2024-08-12 12:57:55,573 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 12:58:12,299 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5150, loss[loss=0.08852, beats_loss=0.01265, ecapa_loss=0.0001142, whisper_loss=0.07473, over 16213.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01108, ecapa_loss=0.0001758, whisper_loss=0.09256, over 3870533.87 frames. ], batch size: 62, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:58:22,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1645610.0, ans=0.0 2024-08-12 12:58:43,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1645810.0, ans=0.125 2024-08-12 12:58:49,863 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.03 vs. limit=22.5 2024-08-12 12:58:55,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1645910.0, ans=0.125 2024-08-12 12:58:56,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1645910.0, ans=0.125 2024-08-12 12:59:13,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1646010.0, ans=0.0 2024-08-12 12:59:17,763 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 12:59:24,458 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5200, loss[loss=0.1021, beats_loss=0.01036, ecapa_loss=0.0001393, whisper_loss=0.09031, over 14894.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01107, ecapa_loss=0.0001756, whisper_loss=0.09269, over 3894971.40 frames. ], batch size: 55, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:59:35,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1646110.0, ans=0.125 2024-08-12 12:59:39,113 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.626e+01 2.923e+01 3.403e+01 3.236e+02, threshold=5.847e+01, percent-clipped=1.0 2024-08-12 12:59:43,689 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-12 13:00:12,832 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2024-08-12 13:00:27,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1646510.0, ans=0.5 2024-08-12 13:00:27,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2024-08-12 13:00:28,484 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 13:00:32,378 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5250, loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0002347, whisper_loss=0.0898, over 20332.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01112, ecapa_loss=0.0001748, whisper_loss=0.09197, over 3892152.93 frames. ], batch size: 88, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:00:32,962 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=7.423e-02 2024-08-12 13:00:45,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1646710.0, ans=0.0 2024-08-12 13:00:46,731 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-08-12 13:00:47,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1646710.0, ans=0.125 2024-08-12 13:00:58,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1646810.0, ans=0.0 2024-08-12 13:01:25,468 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.29 vs. limit=15.0 2024-08-12 13:01:29,756 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 13:01:38,662 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5300, loss[loss=0.1238, beats_loss=0.008768, ecapa_loss=0.0001402, whisper_loss=0.1136, over 17815.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01096, ecapa_loss=0.000178, whisper_loss=0.09341, over 3924219.83 frames. ], batch size: 64, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:01:52,386 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2024-08-12 13:01:54,013 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.492e+01 2.766e+01 3.259e+01 2.039e+02, threshold=5.533e+01, percent-clipped=1.0 2024-08-12 13:01:54,255 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 13:01:55,890 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2024-08-12 13:02:04,916 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 31 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 13:02:14,080 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 13:02:15,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1647310.0, ans=0.1 2024-08-12 13:02:16,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1647410.0, ans=0.125 2024-08-12 13:02:20,735 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 13:02:31,686 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2024-08-12 13:02:43,806 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5350, loss[loss=0.1016, beats_loss=0.009407, ecapa_loss=0.0001695, whisper_loss=0.09052, over 14771.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01097, ecapa_loss=0.0001766, whisper_loss=0.0929, over 3907370.05 frames. ], batch size: 57, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:02:54,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1647610.0, ans=0.0 2024-08-12 13:03:04,881 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 13:03:10,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1647810.0, ans=0.125 2024-08-12 13:03:19,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1647810.0, ans=0.1 2024-08-12 13:03:25,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1647910.0, ans=0.125 2024-08-12 13:03:29,288 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 13:03:38,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1648010.0, ans=0.1 2024-08-12 13:03:48,459 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5400, loss[loss=0.09966, beats_loss=0.008235, ecapa_loss=0.000192, whisper_loss=0.0895, over 16503.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01098, ecapa_loss=0.0001754, whisper_loss=0.0928, over 3889559.36 frames. ], batch size: 64, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:03:49,911 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-12 13:03:50,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1648110.0, ans=0.125 2024-08-12 13:04:00,303 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 29 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-12 13:04:03,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1648210.0, ans=0.125 2024-08-12 13:04:03,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1648210.0, ans=0.0 2024-08-12 13:04:04,397 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.540e+01 2.809e+01 3.411e+01 5.713e+01, threshold=5.618e+01, percent-clipped=1.0 2024-08-12 13:04:42,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1648510.0, ans=0.025 2024-08-12 13:04:43,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1648510.0, ans=0.125 2024-08-12 13:04:54,124 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5450, loss[loss=0.08968, beats_loss=0.0105, ecapa_loss=0.0001336, whisper_loss=0.07785, over 16584.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.011, ecapa_loss=0.0001745, whisper_loss=0.0925, over 3892820.47 frames. ], batch size: 62, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:05:06,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1648710.0, ans=0.0 2024-08-12 13:05:19,756 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-12 13:05:52,860 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.47 vs. limit=22.5 2024-08-12 13:05:52,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.61 vs. limit=22.5 2024-08-12 13:05:53,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1649010.0, ans=0.1 2024-08-12 13:05:59,814 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5500, loss[loss=0.1025, beats_loss=0.01179, ecapa_loss=0.0001528, whisper_loss=0.08922, over 18708.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.011, ecapa_loss=0.0001746, whisper_loss=0.09217, over 3876449.32 frames. ], batch size: 72, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:06:03,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1649110.0, ans=0.1 2024-08-12 13:06:15,284 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.546e+01 2.808e+01 3.382e+01 4.653e+01, threshold=5.615e+01, percent-clipped=0.0 2024-08-12 13:06:21,087 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 13:06:28,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1649310.0, ans=0.125 2024-08-12 13:06:32,904 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2024-08-12 13:06:33,497 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 13:06:33,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1649310.0, ans=0.04949747468305833 2024-08-12 13:06:38,645 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 13:06:38,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1649410.0, ans=0.0 2024-08-12 13:06:40,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1649410.0, ans=0.125 2024-08-12 13:07:00,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1649510.0, ans=0.1 2024-08-12 13:07:05,601 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5550, loss[loss=0.1147, beats_loss=0.01017, ecapa_loss=0.0001565, whisper_loss=0.103, over 19717.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01103, ecapa_loss=0.0001754, whisper_loss=0.09124, over 3872337.09 frames. ], batch size: 77, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:07:10,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=1649610.0, ans=0.05 2024-08-12 13:07:17,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1649610.0, ans=0.125 2024-08-12 13:07:21,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1649710.0, ans=0.0 2024-08-12 13:07:37,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1649810.0, ans=0.025 2024-08-12 13:07:53,558 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 13:08:06,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1650010.0, ans=0.0 2024-08-12 13:08:11,897 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-12 13:08:21,597 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-12 13:08:23,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.85 vs. limit=15.0 2024-08-12 13:08:26,038 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5600, loss[loss=0.08953, beats_loss=0.01125, ecapa_loss=0.0001698, whisper_loss=0.07658, over 13404.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01106, ecapa_loss=0.0001755, whisper_loss=0.09121, over 3878381.61 frames. ], batch size: 53, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:08:27,664 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-12 13:08:51,291 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.532e+01 2.834e+01 3.138e+01 6.030e+01, threshold=5.668e+01, percent-clipped=1.0 2024-08-12 13:09:02,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1650210.0, ans=0.1 2024-08-12 13:09:44,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1650510.0, ans=0.125 2024-08-12 13:09:49,150 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 13:09:55,773 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5650, loss[loss=0.08025, beats_loss=0.01248, ecapa_loss=0.0001465, whisper_loss=0.0663, over 22193.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01106, ecapa_loss=0.0001749, whisper_loss=0.09111, over 3890081.89 frames. ], batch size: 90, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:10:25,047 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 13:10:28,019 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-12 13:10:38,559 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 13:10:41,965 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 13:10:53,063 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.00 vs. limit=22.5 2024-08-12 13:10:55,261 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 13:11:13,197 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5700, loss[loss=0.1264, beats_loss=0.009817, ecapa_loss=0.000128, whisper_loss=0.1153, over 17810.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01111, ecapa_loss=0.0001757, whisper_loss=0.09118, over 3912660.33 frames. ], batch size: 64, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:11:31,711 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.530e+01 2.812e+01 3.253e+01 9.696e+01, threshold=5.623e+01, percent-clipped=1.0 2024-08-12 13:11:31,935 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 13:11:32,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1651210.0, ans=10.0 2024-08-12 13:11:38,184 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 13:12:10,619 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-12 13:12:19,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1651510.0, ans=0.0 2024-08-12 13:12:30,594 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5750, loss[loss=0.1097, beats_loss=0.01107, ecapa_loss=0.0001885, whisper_loss=0.09679, over 22973.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01106, ecapa_loss=0.0001773, whisper_loss=0.09199, over 3923457.64 frames. ], batch size: 92, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:12:55,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1651710.0, ans=0.125 2024-08-12 13:12:59,877 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 13:13:06,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1651810.0, ans=0.125 2024-08-12 13:13:14,840 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-12 13:13:45,837 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5800, loss[loss=0.07401, beats_loss=0.01305, ecapa_loss=0.0002079, whisper_loss=0.05888, over 14204.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01099, ecapa_loss=0.0001778, whisper_loss=0.09269, over 3890616.31 frames. ], batch size: 60, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:13:49,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1652110.0, ans=0.04949747468305833 2024-08-12 13:13:58,411 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 13:13:59,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.69 vs. limit=15.0 2024-08-12 13:14:04,577 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.480e+01 2.682e+01 3.175e+01 5.563e+01, threshold=5.365e+01, percent-clipped=0.0 2024-08-12 13:14:04,832 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-12 13:14:34,012 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=15.0 2024-08-12 13:14:48,570 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-12 13:14:56,922 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-12 13:15:05,765 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5850, loss[loss=0.09207, beats_loss=0.008552, ecapa_loss=0.0002222, whisper_loss=0.0813, over 16295.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01104, ecapa_loss=0.0001761, whisper_loss=0.0926, over 3919709.89 frames. ], batch size: 66, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:15:06,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1652610.0, ans=0.0 2024-08-12 13:15:09,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1652610.0, ans=0.125 2024-08-12 13:15:35,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1652710.0, ans=0.125 2024-08-12 13:15:42,642 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 13:15:47,315 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 21 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 13:15:54,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1652910.0, ans=0.2 2024-08-12 13:15:57,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1652910.0, ans=0.125 2024-08-12 13:16:25,206 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5900, loss[loss=0.07246, beats_loss=0.01278, ecapa_loss=0.0001621, whisper_loss=0.05806, over 16850.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01103, ecapa_loss=0.0001758, whisper_loss=0.09286, over 3911076.15 frames. ], batch size: 68, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:16:25,907 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2024-08-12 13:16:35,423 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=15.0 2024-08-12 13:16:43,934 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.413e+01 2.728e+01 2.999e+01 4.140e+01, threshold=5.456e+01, percent-clipped=0.0 2024-08-12 13:16:46,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1653210.0, ans=0.0 2024-08-12 13:17:01,898 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 31 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 13:17:13,053 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 32 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-12 13:17:21,078 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2024-08-12 13:17:32,752 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 13:17:38,996 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 13:17:43,020 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 5950, loss[loss=0.1036, beats_loss=0.009838, ecapa_loss=0.0001588, whisper_loss=0.0922, over 17297.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01104, ecapa_loss=0.0001753, whisper_loss=0.09204, over 3885080.08 frames. ], batch size: 66, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:17:43,338 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 31 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 13:17:54,124 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 13:17:58,523 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 13:18:04,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1653710.0, ans=0.1 2024-08-12 13:18:05,137 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.09 vs. limit=10.0 2024-08-12 13:18:24,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.78 vs. limit=12.0 2024-08-12 13:18:27,083 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.27 vs. limit=15.0 2024-08-12 13:18:37,666 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 13:18:42,851 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 35 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 13:18:54,800 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 13:18:57,195 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2024-08-12 13:19:02,503 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 13:19:03,584 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6000, loss[loss=0.1033, beats_loss=0.009289, ecapa_loss=0.0001522, whisper_loss=0.09248, over 16734.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01107, ecapa_loss=0.0001754, whisper_loss=0.09163, over 3879774.92 frames. ], batch size: 62, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:19:03,584 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 13:19:40,022 INFO [train_multi_KD3.py:1149] (2/4) Epoch 12, validation on ASR_libri: loss=0.2551, beats_loss=0, ecapa_loss=0.0005888, whisper_loss=0.2492, over 922467.00 frames. 2024-08-12 13:19:46,858 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.6819, 1.1920, 1.7609, 2.3169], device='cuda:2') 2024-08-12 13:19:58,364 INFO [train_multi_KD3.py:1149] (2/4) Epoch 12, validation on SV_voxceleb1: loss=0.004729, beats_loss=0, ecapa_loss=0.0004729, whisper_loss=0, over 939242.00 frames. 2024-08-12 13:21:43,870 INFO [train_multi_KD3.py:1149] (2/4) Epoch 12, validation on AT_audioset: loss=0.02432, beats_loss=0.02432, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 13:21:43,874 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 13:22:03,039 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.605e+01 2.854e+01 3.270e+01 6.510e+01, threshold=5.707e+01, percent-clipped=1.0 2024-08-12 13:22:05,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=1654210.0, ans=12.0 2024-08-12 13:22:06,301 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-12 13:22:06,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1654210.0, ans=0.1 2024-08-12 13:22:09,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1654210.0, ans=0.0 2024-08-12 13:22:15,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1654310.0, ans=0.0 2024-08-12 13:22:17,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1654310.0, ans=0.1 2024-08-12 13:22:56,497 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=9.268e-02 2024-08-12 13:23:02,941 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6050, loss[loss=0.1136, beats_loss=0.01171, ecapa_loss=0.0001387, whisper_loss=0.1005, over 16510.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01105, ecapa_loss=0.0001749, whisper_loss=0.09156, over 3871094.83 frames. ], batch size: 63, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:23:08,429 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 12 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-12 13:23:26,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1654710.0, ans=0.125 2024-08-12 13:23:41,194 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 13:23:49,122 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 13:24:00,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1654910.0, ans=0.0 2024-08-12 13:24:00,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1654910.0, ans=0.0 2024-08-12 13:24:15,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1655010.0, ans=0.1 2024-08-12 13:24:23,331 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6100, loss[loss=0.1043, beats_loss=0.01199, ecapa_loss=0.0001537, whisper_loss=0.09073, over 23927.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01096, ecapa_loss=0.0001745, whisper_loss=0.09182, over 3901641.11 frames. ], batch size: 93, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:24:28,482 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-12 13:24:35,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1655110.0, ans=0.07 2024-08-12 13:24:42,403 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.433e+01 2.687e+01 2.996e+01 4.596e+01, threshold=5.373e+01, percent-clipped=0.0 2024-08-12 13:24:49,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1655210.0, ans=0.0 2024-08-12 13:25:11,192 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 13:25:23,331 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-08-12 13:25:42,838 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6150, loss[loss=0.1104, beats_loss=0.01119, ecapa_loss=0.0001469, whisper_loss=0.09775, over 23503.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01104, ecapa_loss=0.0001738, whisper_loss=0.09142, over 3924203.75 frames. ], batch size: 91, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:25:48,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1655610.0, ans=0.07 2024-08-12 13:25:56,234 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 13:26:00,551 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-12 13:26:15,123 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 13:26:15,630 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2024-08-12 13:26:22,658 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 13:26:36,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1655910.0, ans=0.1 2024-08-12 13:26:39,652 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-12 13:27:01,891 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6200, loss[loss=0.0948, beats_loss=0.009016, ecapa_loss=0.0001815, whisper_loss=0.08397, over 22058.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01096, ecapa_loss=0.0001754, whisper_loss=0.0921, over 3927197.22 frames. ], batch size: 86, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:27:17,675 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-08-12 13:27:21,056 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.566e+01 2.980e+01 3.459e+01 1.302e+02, threshold=5.960e+01, percent-clipped=3.0 2024-08-12 13:27:29,822 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.92 vs. limit=15.0 2024-08-12 13:27:42,123 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2024-08-12 13:27:54,366 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 13:28:20,369 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6250, loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001753, whisper_loss=0.09065, over 15647.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01086, ecapa_loss=0.0001766, whisper_loss=0.09248, over 3926583.43 frames. ], batch size: 61, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:28:30,577 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=12.0 2024-08-12 13:28:55,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1656810.0, ans=0.09899494936611666 2024-08-12 13:28:56,421 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-12 13:29:05,342 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2024-08-12 13:29:26,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1657010.0, ans=0.125 2024-08-12 13:29:30,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1657010.0, ans=0.125 2024-08-12 13:29:38,524 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6300, loss[loss=0.1137, beats_loss=0.01158, ecapa_loss=0.0001575, whisper_loss=0.1006, over 19992.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01083, ecapa_loss=0.0001767, whisper_loss=0.09312, over 3930825.16 frames. ], batch size: 78, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:29:41,130 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-12 13:29:44,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1657110.0, ans=10.0 2024-08-12 13:29:50,625 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 13:29:56,505 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.43 vs. limit=15.0 2024-08-12 13:29:56,910 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.533e+01 2.764e+01 3.173e+01 6.844e+01, threshold=5.528e+01, percent-clipped=1.0 2024-08-12 13:30:14,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1657310.0, ans=0.125 2024-08-12 13:30:17,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1657310.0, ans=0.1 2024-08-12 13:30:33,710 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-12 13:30:51,231 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-12 13:30:54,842 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6350, loss[loss=0.09946, beats_loss=0.009552, ecapa_loss=0.0002163, whisper_loss=0.08775, over 15479.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01086, ecapa_loss=0.000176, whisper_loss=0.093, over 3916054.78 frames. ], batch size: 65, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:30:55,304 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 13:31:05,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1657610.0, ans=0.125 2024-08-12 13:31:28,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=15.0 2024-08-12 13:31:31,795 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-12 13:31:44,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1657910.0, ans=0.125 2024-08-12 13:31:47,408 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.202e+02 2024-08-12 13:31:49,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1657910.0, ans=0.0 2024-08-12 13:31:53,370 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 13:31:54,011 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=15.0 2024-08-12 13:31:56,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1658010.0, ans=0.125 2024-08-12 13:31:56,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1658010.0, ans=0.0 2024-08-12 13:31:59,379 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 21 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 13:32:11,063 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6400, loss[loss=0.1065, beats_loss=0.01242, ecapa_loss=0.0001461, whisper_loss=0.09262, over 23294.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01097, ecapa_loss=0.0001752, whisper_loss=0.09194, over 3903656.31 frames. ], batch size: 91, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:32:12,438 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 13:32:26,559 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 13:32:27,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1658210.0, ans=0.0 2024-08-12 13:32:29,488 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.462e+01 2.715e+01 3.060e+01 4.478e+01, threshold=5.430e+01, percent-clipped=0.0 2024-08-12 13:32:37,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1658210.0, ans=0.0 2024-08-12 13:32:44,449 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 13:32:56,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1658410.0, ans=0.0 2024-08-12 13:33:17,032 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 35 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 13:33:18,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1658510.0, ans=0.0 2024-08-12 13:33:26,047 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6450, loss[loss=0.1147, beats_loss=0.009405, ecapa_loss=0.0001733, whisper_loss=0.1036, over 23342.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.011, ecapa_loss=0.0001757, whisper_loss=0.09181, over 3907403.48 frames. ], batch size: 90, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:33:27,186 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2024-08-12 13:33:45,536 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.15 vs. limit=22.5 2024-08-12 13:33:47,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1658710.0, ans=0.125 2024-08-12 13:34:13,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1658910.0, ans=0.125 2024-08-12 13:34:15,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1658910.0, ans=0.0 2024-08-12 13:34:26,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1659010.0, ans=0.2 2024-08-12 13:34:41,098 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6500, loss[loss=0.09981, beats_loss=0.01169, ecapa_loss=0.0001497, whisper_loss=0.08663, over 20272.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01095, ecapa_loss=0.0001746, whisper_loss=0.09265, over 3906001.80 frames. ], batch size: 80, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:34:58,951 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+01 2.642e+01 2.943e+01 3.228e+01 1.281e+02, threshold=5.885e+01, percent-clipped=1.0 2024-08-12 13:35:01,012 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-12 13:35:22,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1659310.0, ans=0.125 2024-08-12 13:35:31,513 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 13:35:40,543 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 13:35:55,094 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.23 vs. limit=12.0 2024-08-12 13:35:55,383 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6550, loss[loss=0.09589, beats_loss=0.01245, ecapa_loss=0.0001988, whisper_loss=0.08145, over 20508.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01096, ecapa_loss=0.0001739, whisper_loss=0.09328, over 3911894.50 frames. ], batch size: 88, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:35:58,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1659610.0, ans=0.1 2024-08-12 13:36:09,601 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 13:36:10,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1659710.0, ans=0.125 2024-08-12 13:36:27,237 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 13:36:33,543 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 21 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-12 13:37:00,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1660010.0, ans=0.0 2024-08-12 13:37:01,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1660010.0, ans=0.1 2024-08-12 13:37:10,601 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6600, loss[loss=0.108, beats_loss=0.01203, ecapa_loss=0.0001916, whisper_loss=0.09405, over 21646.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01099, ecapa_loss=0.0001762, whisper_loss=0.09341, over 3925269.69 frames. ], batch size: 92, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:37:11,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1660110.0, ans=0.125 2024-08-12 13:37:18,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1660110.0, ans=0.1 2024-08-12 13:37:24,134 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.270e-02 2024-08-12 13:37:25,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1660210.0, ans=0.1 2024-08-12 13:37:28,525 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.664e+01 3.043e+01 3.449e+01 7.276e+01, threshold=6.087e+01, percent-clipped=1.0 2024-08-12 13:37:34,423 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 33 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 13:37:41,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1660310.0, ans=0.0 2024-08-12 13:38:12,557 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 13:38:23,777 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6650, loss[loss=0.1061, beats_loss=0.01036, ecapa_loss=0.0001475, whisper_loss=0.0943, over 21973.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01108, ecapa_loss=0.0001762, whisper_loss=0.09235, over 3927340.19 frames. ], batch size: 81, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:38:32,692 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 13:38:32,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1660610.0, ans=0.2 2024-08-12 13:39:27,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1661010.0, ans=10.0 2024-08-12 13:39:32,020 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2024-08-12 13:39:35,036 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6700, loss[loss=0.1202, beats_loss=0.009261, ecapa_loss=0.0002022, whisper_loss=0.1089, over 18394.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01096, ecapa_loss=0.0001771, whisper_loss=0.09287, over 3907712.26 frames. ], batch size: 69, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:39:52,505 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.550e+01 2.931e+01 3.277e+01 4.693e+01, threshold=5.862e+01, percent-clipped=0.0 2024-08-12 13:39:54,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1661210.0, ans=0.1 2024-08-12 13:39:57,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1661210.0, ans=0.07 2024-08-12 13:40:00,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1661210.0, ans=0.125 2024-08-12 13:40:06,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1661310.0, ans=0.2 2024-08-12 13:40:09,358 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 13:40:12,219 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-12 13:40:12,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1661310.0, ans=0.125 2024-08-12 13:40:26,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1661410.0, ans=0.2 2024-08-12 13:40:29,202 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.12 vs. limit=12.0 2024-08-12 13:40:30,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1661410.0, ans=0.05 2024-08-12 13:40:37,821 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 21 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-12 13:40:47,698 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6750, loss[loss=0.1013, beats_loss=0.0115, ecapa_loss=0.0001706, whisper_loss=0.08811, over 17916.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0109, ecapa_loss=0.000178, whisper_loss=0.09317, over 3887946.43 frames. ], batch size: 70, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:40:57,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1661610.0, ans=0.125 2024-08-12 13:41:38,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1661910.0, ans=0.04949747468305833 2024-08-12 13:41:40,544 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 13:41:43,677 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 13:41:50,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1662010.0, ans=0.0 2024-08-12 13:41:58,502 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6800, loss[loss=0.09304, beats_loss=0.01255, ecapa_loss=0.0002248, whisper_loss=0.07824, over 22080.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01089, ecapa_loss=0.0001774, whisper_loss=0.09326, over 3882245.59 frames. ], batch size: 93, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:42:03,225 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-12 13:42:14,993 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.516e+01 2.723e+01 3.027e+01 3.885e+01, threshold=5.446e+01, percent-clipped=0.0 2024-08-12 13:42:19,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1662210.0, ans=0.125 2024-08-12 13:42:29,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1662310.0, ans=0.1 2024-08-12 13:42:40,143 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 13:43:08,238 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6850, loss[loss=0.08395, beats_loss=0.01107, ecapa_loss=0.0002029, whisper_loss=0.07085, over 22063.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01089, ecapa_loss=0.0001777, whisper_loss=0.09292, over 3870862.67 frames. ], batch size: 95, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:43:08,415 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 13:43:23,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1662710.0, ans=0.0 2024-08-12 13:43:53,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1662910.0, ans=0.0 2024-08-12 13:44:12,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1663010.0, ans=0.1 2024-08-12 13:44:19,358 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6900, loss[loss=0.09885, beats_loss=0.0105, ecapa_loss=0.0001666, whisper_loss=0.08667, over 19202.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01099, ecapa_loss=0.0001765, whisper_loss=0.09225, over 3872052.02 frames. ], batch size: 76, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:44:26,992 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2024-08-12 13:44:30,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1663110.0, ans=0.05 2024-08-12 13:44:32,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1663210.0, ans=0.0 2024-08-12 13:44:35,800 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.437e+01 2.665e+01 2.983e+01 5.492e+01, threshold=5.330e+01, percent-clipped=1.0 2024-08-12 13:44:37,142 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.70 vs. limit=5.0 2024-08-12 13:44:39,129 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 13:44:41,860 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 13:44:42,486 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.55 vs. limit=15.0 2024-08-12 13:44:46,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1663310.0, ans=0.1 2024-08-12 13:44:47,681 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 13:44:49,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1663310.0, ans=0.2 2024-08-12 13:44:50,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1663310.0, ans=0.125 2024-08-12 13:44:53,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1663310.0, ans=0.1 2024-08-12 13:45:05,197 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2024-08-12 13:45:06,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1663410.0, ans=0.0 2024-08-12 13:45:11,853 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-12 13:45:17,478 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-12 13:45:19,259 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2024-08-12 13:45:24,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1663510.0, ans=0.1 2024-08-12 13:45:29,924 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 6950, loss[loss=0.1185, beats_loss=0.01097, ecapa_loss=0.0001729, whisper_loss=0.1059, over 22462.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01102, ecapa_loss=0.0001764, whisper_loss=0.09209, over 3907222.83 frames. ], batch size: 89, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:45:52,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1663710.0, ans=0.125 2024-08-12 13:46:20,007 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 13:46:42,870 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7000, loss[loss=0.09552, beats_loss=0.0135, ecapa_loss=0.0001711, whisper_loss=0.08031, over 21697.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01106, ecapa_loss=0.0001769, whisper_loss=0.09184, over 3891921.05 frames. ], batch size: 89, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:46:49,452 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-08-12 13:46:55,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1664110.0, ans=0.1 2024-08-12 13:46:57,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1664210.0, ans=0.025 2024-08-12 13:46:57,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1664210.0, ans=0.0 2024-08-12 13:46:59,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1664210.0, ans=0.015 2024-08-12 13:46:59,585 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=15.0 2024-08-12 13:47:00,011 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.453e+01 2.767e+01 3.334e+01 1.862e+02, threshold=5.533e+01, percent-clipped=4.0 2024-08-12 13:47:16,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1664310.0, ans=0.125 2024-08-12 13:47:25,003 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 13:47:31,667 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2024-08-12 13:47:43,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1664510.0, ans=0.2 2024-08-12 13:47:55,446 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7050, loss[loss=0.1205, beats_loss=0.0112, ecapa_loss=0.0002203, whisper_loss=0.1071, over 21531.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01098, ecapa_loss=0.0001786, whisper_loss=0.09227, over 3884557.68 frames. ], batch size: 89, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:48:00,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1664610.0, ans=0.1 2024-08-12 13:48:14,724 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 22 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 13:48:40,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1664910.0, ans=0.125 2024-08-12 13:48:40,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1664910.0, ans=0.1 2024-08-12 13:48:48,448 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 13:48:56,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1665010.0, ans=0.95 2024-08-12 13:48:58,371 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.20 vs. limit=22.5 2024-08-12 13:49:04,831 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 13:49:06,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1665010.0, ans=0.5 2024-08-12 13:49:09,043 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7100, loss[loss=0.1103, beats_loss=0.01043, ecapa_loss=0.0001859, whisper_loss=0.09803, over 20928.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.011, ecapa_loss=0.000177, whisper_loss=0.09213, over 3871386.63 frames. ], batch size: 88, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:49:25,956 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.512e+01 2.817e+01 3.133e+01 5.318e+01, threshold=5.634e+01, percent-clipped=0.0 2024-08-12 13:49:26,113 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 31 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 13:49:48,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1665310.0, ans=0.0 2024-08-12 13:50:09,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1665510.0, ans=0.07 2024-08-12 13:50:21,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1665610.0, ans=0.035 2024-08-12 13:50:22,314 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7150, loss[loss=0.09988, beats_loss=0.01271, ecapa_loss=0.0001427, whisper_loss=0.08574, over 22687.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01102, ecapa_loss=0.000176, whisper_loss=0.09223, over 3891129.41 frames. ], batch size: 89, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:50:25,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1665610.0, ans=0.125 2024-08-12 13:50:29,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1665610.0, ans=0.125 2024-08-12 13:50:40,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1665710.0, ans=0.07 2024-08-12 13:50:45,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1665710.0, ans=0.1 2024-08-12 13:50:55,591 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 29 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 13:51:25,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1666010.0, ans=0.1 2024-08-12 13:51:32,166 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 13:51:34,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1666010.0, ans=0.125 2024-08-12 13:51:35,840 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.67 vs. limit=22.5 2024-08-12 13:51:36,173 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7200, loss[loss=0.09277, beats_loss=0.01352, ecapa_loss=0.000178, whisper_loss=0.07747, over 21287.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01092, ecapa_loss=0.0001759, whisper_loss=0.09322, over 3910764.47 frames. ], batch size: 88, lr: 5.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:51:44,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1666110.0, ans=0.125 2024-08-12 13:51:53,631 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.582e+01 2.995e+01 3.267e+01 4.717e+01, threshold=5.989e+01, percent-clipped=0.0 2024-08-12 13:51:54,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1666210.0, ans=0.0 2024-08-12 13:51:58,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=1666210.0, ans=15.0 2024-08-12 13:51:59,497 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 13:52:00,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1666210.0, ans=0.0 2024-08-12 13:52:08,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1666310.0, ans=0.1 2024-08-12 13:52:17,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1666310.0, ans=0.2 2024-08-12 13:52:46,453 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-12 13:52:48,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7250, loss[loss=0.09921, beats_loss=0.01147, ecapa_loss=0.0001511, whisper_loss=0.08622, over 22819.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.011, ecapa_loss=0.0001747, whisper_loss=0.09297, over 3929756.42 frames. ], batch size: 90, lr: 5.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:53:12,822 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.69 vs. limit=6.0 2024-08-12 13:53:13,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1666710.0, ans=0.0 2024-08-12 13:53:14,332 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2024-08-12 13:53:18,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1666810.0, ans=0.125 2024-08-12 13:53:26,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1666810.0, ans=0.125 2024-08-12 13:53:29,327 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 13:53:29,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1666810.0, ans=0.125 2024-08-12 13:53:37,788 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.26 vs. limit=15.0 2024-08-12 13:54:05,196 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7300, loss[loss=0.1042, beats_loss=0.008295, ecapa_loss=0.0001965, whisper_loss=0.09395, over 20353.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01103, ecapa_loss=0.000175, whisper_loss=0.09296, over 3922399.61 frames. ], batch size: 80, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:54:06,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1667110.0, ans=0.125 2024-08-12 13:54:10,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1667110.0, ans=0.0 2024-08-12 13:54:14,642 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 32 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 13:54:24,191 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.453e+01 2.736e+01 3.058e+01 4.580e+01, threshold=5.471e+01, percent-clipped=0.0 2024-08-12 13:54:38,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1667310.0, ans=0.07 2024-08-12 13:54:42,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1667310.0, ans=0.025 2024-08-12 13:54:45,983 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 13:55:23,497 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7350, loss[loss=0.1375, beats_loss=0.009887, ecapa_loss=0.0001268, whisper_loss=0.1263, over 15930.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.0109, ecapa_loss=0.0001747, whisper_loss=0.0935, over 3900719.98 frames. ], batch size: 55, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:55:24,196 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.55 vs. limit=22.5 2024-08-12 13:55:26,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1667610.0, ans=0.0 2024-08-12 13:55:27,100 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 13:55:27,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1667610.0, ans=0.0 2024-08-12 13:55:36,579 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 37 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 13:55:42,585 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-12 13:56:07,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1667810.0, ans=0.0 2024-08-12 13:56:10,177 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 13:56:41,157 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7400, loss[loss=0.1162, beats_loss=0.01053, ecapa_loss=0.0001848, whisper_loss=0.1039, over 22362.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01092, ecapa_loss=0.0001747, whisper_loss=0.09295, over 3914590.14 frames. ], batch size: 90, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:56:41,553 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-12 13:56:50,692 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.391e+00 2024-08-12 13:56:58,740 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.596e+01 2.915e+01 3.233e+01 4.650e+01, threshold=5.831e+01, percent-clipped=0.0 2024-08-12 13:57:31,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1668410.0, ans=0.125 2024-08-12 13:57:49,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1668510.0, ans=0.2 2024-08-12 13:57:51,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1668510.0, ans=0.0 2024-08-12 13:57:55,764 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7450, loss[loss=0.09335, beats_loss=0.01278, ecapa_loss=0.0001595, whisper_loss=0.07897, over 21968.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01086, ecapa_loss=0.0001769, whisper_loss=0.09316, over 3880133.43 frames. ], batch size: 91, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:58:21,088 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-12 13:58:32,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1668810.0, ans=0.1 2024-08-12 13:58:37,625 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.75 vs. limit=10.0 2024-08-12 13:58:42,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1668910.0, ans=0.125 2024-08-12 13:58:48,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1668910.0, ans=0.125 2024-08-12 13:58:50,238 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-12 13:59:12,864 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7500, loss[loss=0.08998, beats_loss=0.01051, ecapa_loss=0.0001967, whisper_loss=0.0775, over 16617.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01084, ecapa_loss=0.0001769, whisper_loss=0.09334, over 3859231.12 frames. ], batch size: 67, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:59:22,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1669110.0, ans=0.125 2024-08-12 13:59:29,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1669210.0, ans=0.125 2024-08-12 13:59:30,399 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.570e+01 2.878e+01 3.293e+01 5.497e+01, threshold=5.755e+01, percent-clipped=0.0 2024-08-12 13:59:30,940 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 13:59:37,192 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 13:59:44,761 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-12 13:59:46,130 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 13:59:53,149 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-12 13:59:54,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1669310.0, ans=0.125 2024-08-12 14:00:04,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1669410.0, ans=0.0 2024-08-12 14:00:20,619 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-12 14:00:26,379 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7550, loss[loss=0.1243, beats_loss=0.01045, ecapa_loss=0.000217, whisper_loss=0.1117, over 22287.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01092, ecapa_loss=0.0001771, whisper_loss=0.09283, over 3846919.91 frames. ], batch size: 92, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:00:34,562 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 14:00:40,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1669710.0, ans=0.125 2024-08-12 14:01:08,753 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 14:01:17,056 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2024-08-12 14:01:19,178 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 24 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 14:01:21,072 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=15.0 2024-08-12 14:01:28,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1670010.0, ans=0.0 2024-08-12 14:01:39,663 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 14:01:41,052 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7600, loss[loss=0.0873, beats_loss=0.0113, ecapa_loss=0.0002116, whisper_loss=0.07388, over 17690.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01092, ecapa_loss=0.0001772, whisper_loss=0.09213, over 3842094.98 frames. ], batch size: 76, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:01:55,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1670210.0, ans=0.95 2024-08-12 14:01:58,011 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 14:01:59,027 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.500e+01 2.707e+01 3.102e+01 5.200e+01, threshold=5.414e+01, percent-clipped=0.0 2024-08-12 14:01:59,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1670210.0, ans=0.125 2024-08-12 14:02:08,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1670210.0, ans=0.125 2024-08-12 14:02:09,991 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-12 14:02:13,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1670310.0, ans=0.0 2024-08-12 14:02:42,979 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 14:02:48,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1670510.0, ans=0.125 2024-08-12 14:02:50,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1670510.0, ans=0.1 2024-08-12 14:02:52,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1670510.0, ans=0.0 2024-08-12 14:02:57,286 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7650, loss[loss=0.09654, beats_loss=0.01027, ecapa_loss=0.0001835, whisper_loss=0.08443, over 16409.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01088, ecapa_loss=0.0001779, whisper_loss=0.09271, over 3864035.70 frames. ], batch size: 64, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:03:01,217 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-12 14:03:03,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1670610.0, ans=0.125 2024-08-12 14:03:15,618 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 14:03:17,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1670710.0, ans=0.125 2024-08-12 14:04:02,221 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 26 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-12 14:04:07,652 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2024-08-12 14:04:10,289 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 14:04:12,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1671010.0, ans=0.125 2024-08-12 14:04:14,449 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7700, loss[loss=0.0826, beats_loss=0.01239, ecapa_loss=0.0001763, whisper_loss=0.06845, over 18499.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01078, ecapa_loss=0.0001772, whisper_loss=0.09303, over 3887301.53 frames. ], batch size: 76, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:04:31,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1671210.0, ans=0.125 2024-08-12 14:04:33,467 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.546e+01 2.810e+01 3.287e+01 1.654e+02, threshold=5.620e+01, percent-clipped=2.0 2024-08-12 14:04:34,749 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=15.0 2024-08-12 14:04:37,189 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 14:04:48,644 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 14:05:08,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1671410.0, ans=0.125 2024-08-12 14:05:08,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1671410.0, ans=0.0 2024-08-12 14:05:11,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1671410.0, ans=0.1 2024-08-12 14:05:18,417 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 14:05:25,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1671510.0, ans=0.125 2024-08-12 14:05:27,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1671510.0, ans=0.125 2024-08-12 14:05:36,112 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7750, loss[loss=0.1078, beats_loss=0.01097, ecapa_loss=0.0001795, whisper_loss=0.095, over 23346.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01085, ecapa_loss=0.0001768, whisper_loss=0.09201, over 3910532.95 frames. ], batch size: 97, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:05:39,164 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 34 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 14:05:44,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1671610.0, ans=0.125 2024-08-12 14:05:47,095 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 14:05:56,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1671710.0, ans=0.125 2024-08-12 14:06:22,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1671810.0, ans=0.0 2024-08-12 14:06:30,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1671910.0, ans=0.0 2024-08-12 14:06:57,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1672010.0, ans=15.0 2024-08-12 14:07:02,805 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7800, loss[loss=0.1058, beats_loss=0.009855, ecapa_loss=0.000175, whisper_loss=0.09417, over 14751.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01085, ecapa_loss=0.0001759, whisper_loss=0.09252, over 3896802.18 frames. ], batch size: 56, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:07:05,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1672110.0, ans=0.125 2024-08-12 14:07:15,900 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-12 14:07:22,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1672210.0, ans=0.125 2024-08-12 14:07:23,532 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.562e+01 2.777e+01 3.107e+01 5.363e+01, threshold=5.555e+01, percent-clipped=0.0 2024-08-12 14:07:27,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1672210.0, ans=0.125 2024-08-12 14:07:39,163 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 32 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 14:07:54,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1672410.0, ans=0.125 2024-08-12 14:07:57,547 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.89 vs. limit=22.5 2024-08-12 14:08:08,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1672410.0, ans=0.0 2024-08-12 14:08:09,730 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 14:08:22,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1672510.0, ans=0.05 2024-08-12 14:08:28,802 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7850, loss[loss=0.09689, beats_loss=0.01351, ecapa_loss=0.0001692, whisper_loss=0.08169, over 22094.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01099, ecapa_loss=0.0001749, whisper_loss=0.09219, over 3924949.79 frames. ], batch size: 90, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:09:03,718 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-12 14:09:10,728 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 14:09:53,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1673010.0, ans=0.125 2024-08-12 14:09:55,728 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-12 14:09:56,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1673010.0, ans=0.125 2024-08-12 14:09:59,035 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7900, loss[loss=0.1105, beats_loss=0.0108, ecapa_loss=0.0001839, whisper_loss=0.09789, over 22540.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01101, ecapa_loss=0.000175, whisper_loss=0.09266, over 3933734.78 frames. ], batch size: 89, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:10:01,339 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2024-08-12 14:10:13,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1673210.0, ans=0.125 2024-08-12 14:10:17,557 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.710e+01 2.918e+01 3.314e+01 4.550e+01, threshold=5.837e+01, percent-clipped=0.0 2024-08-12 14:11:00,792 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.11 vs. limit=22.5 2024-08-12 14:11:04,258 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.68 vs. limit=15.0 2024-08-12 14:11:18,599 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 7950, loss[loss=0.09728, beats_loss=0.01141, ecapa_loss=0.0001856, whisper_loss=0.08402, over 22669.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01092, ecapa_loss=0.0001764, whisper_loss=0.09327, over 3941530.29 frames. ], batch size: 94, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:11:20,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1673610.0, ans=0.0 2024-08-12 14:11:29,020 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 14:11:38,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1673710.0, ans=0.125 2024-08-12 14:11:47,550 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-12 14:12:29,782 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 14:12:33,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1674010.0, ans=0.2 2024-08-12 14:12:45,881 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=12.0 2024-08-12 14:12:47,962 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8000, loss[loss=0.09211, beats_loss=0.01289, ecapa_loss=0.0001505, whisper_loss=0.07772, over 19416.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01092, ecapa_loss=0.0001749, whisper_loss=0.09349, over 3927265.97 frames. ], batch size: 76, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:13:06,774 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 14:13:07,789 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.597e+01 2.930e+01 3.466e+01 8.592e+01, threshold=5.860e+01, percent-clipped=1.0 2024-08-12 14:13:16,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1674210.0, ans=0.0 2024-08-12 14:13:39,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1674410.0, ans=0.1 2024-08-12 14:13:52,968 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 31 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 14:14:03,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1674510.0, ans=0.1 2024-08-12 14:14:13,415 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 14:14:16,436 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8050, loss[loss=0.07568, beats_loss=0.01031, ecapa_loss=0.0001728, whisper_loss=0.06364, over 16446.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01093, ecapa_loss=0.0001753, whisper_loss=0.09246, over 3888983.24 frames. ], batch size: 66, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:14:25,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1674610.0, ans=0.0 2024-08-12 14:15:51,308 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8100, loss[loss=0.09805, beats_loss=0.01052, ecapa_loss=0.0002255, whisper_loss=0.08528, over 21097.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01093, ecapa_loss=0.0001742, whisper_loss=0.09236, over 3858839.92 frames. ], batch size: 92, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:15:54,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1675110.0, ans=0.0 2024-08-12 14:16:00,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1675110.0, ans=0.1 2024-08-12 14:16:00,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1675110.0, ans=0.125 2024-08-12 14:16:04,241 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 14:16:12,014 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.342e+01 2.574e+01 2.867e+01 4.166e+01, threshold=5.148e+01, percent-clipped=0.0 2024-08-12 14:16:19,515 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2024-08-12 14:16:25,955 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.83 vs. limit=22.5 2024-08-12 14:16:26,008 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.19 vs. limit=15.0 2024-08-12 14:16:56,132 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 14:17:14,636 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2024-08-12 14:17:18,087 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8150, loss[loss=0.08807, beats_loss=0.01322, ecapa_loss=0.0001318, whisper_loss=0.07353, over 21647.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01097, ecapa_loss=0.000175, whisper_loss=0.09174, over 3858398.22 frames. ], batch size: 86, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:17:31,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1675610.0, ans=0.035 2024-08-12 14:18:03,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1675810.0, ans=0.1 2024-08-12 14:18:08,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1675810.0, ans=0.0 2024-08-12 14:18:14,746 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.96 vs. limit=22.5 2024-08-12 14:18:16,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1675910.0, ans=0.125 2024-08-12 14:18:30,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1675910.0, ans=0.125 2024-08-12 14:18:31,555 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 14:18:43,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1676010.0, ans=0.1 2024-08-12 14:18:50,951 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8200, loss[loss=0.1121, beats_loss=0.01017, ecapa_loss=0.0002096, whisper_loss=0.09979, over 22460.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01098, ecapa_loss=0.0001756, whisper_loss=0.09159, over 3898386.84 frames. ], batch size: 93, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:18:51,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1676110.0, ans=0.125 2024-08-12 14:19:04,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1676110.0, ans=0.125 2024-08-12 14:19:12,792 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.665e+01 2.595e+01 2.929e+01 3.219e+01 5.675e+01, threshold=5.858e+01, percent-clipped=2.0 2024-08-12 14:19:16,004 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 38 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 14:19:19,475 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 25 from LS+wenet, 8 from Vox, 22 fro AS 2024-08-12 14:19:21,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1676210.0, ans=0.0 2024-08-12 14:19:27,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1676310.0, ans=0.0 2024-08-12 14:19:35,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.40 vs. limit=22.5 2024-08-12 14:20:09,940 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2024-08-12 14:20:09,994 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.25 vs. limit=15.0 2024-08-12 14:20:10,538 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-12 14:20:13,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1676510.0, ans=0.125 2024-08-12 14:20:15,627 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8250, loss[loss=0.1002, beats_loss=0.009553, ecapa_loss=0.0001385, whisper_loss=0.0893, over 16868.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01105, ecapa_loss=0.0001752, whisper_loss=0.09101, over 3911959.26 frames. ], batch size: 65, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:20:21,305 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 14:20:22,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1676610.0, ans=0.0 2024-08-12 14:20:27,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1676610.0, ans=0.2 2024-08-12 14:20:29,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1676610.0, ans=0.0 2024-08-12 14:20:34,313 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.28 vs. limit=22.5 2024-08-12 14:20:45,364 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 14:20:47,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1676710.0, ans=0.1 2024-08-12 14:21:05,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1676810.0, ans=0.125 2024-08-12 14:21:19,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1676910.0, ans=0.125 2024-08-12 14:21:24,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1676910.0, ans=0.125 2024-08-12 14:21:46,324 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8300, loss[loss=0.1057, beats_loss=0.01006, ecapa_loss=0.0001957, whisper_loss=0.09372, over 17105.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01104, ecapa_loss=0.0001751, whisper_loss=0.09122, over 3900213.55 frames. ], batch size: 67, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:22:05,063 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 14:22:06,286 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.461e+01 2.729e+01 3.210e+01 2.355e+02, threshold=5.459e+01, percent-clipped=3.0 2024-08-12 14:22:26,065 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-12 14:22:28,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1677310.0, ans=0.1 2024-08-12 14:22:50,389 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 31 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 14:22:51,960 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-12 14:22:53,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1677510.0, ans=0.1 2024-08-12 14:23:10,524 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 14:23:12,310 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8350, loss[loss=0.1046, beats_loss=0.01096, ecapa_loss=0.0002233, whisper_loss=0.09137, over 21165.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01096, ecapa_loss=0.0001759, whisper_loss=0.09234, over 3923523.65 frames. ], batch size: 88, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:23:16,545 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-12 14:23:27,044 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-12 14:23:33,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1677710.0, ans=10.0 2024-08-12 14:23:43,256 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 14:24:15,054 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 14:24:19,081 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=7.843e-03 2024-08-12 14:24:28,260 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 14:24:31,622 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 14:24:38,422 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8400, loss[loss=0.07006, beats_loss=0.01255, ecapa_loss=0.0002036, whisper_loss=0.05548, over 21411.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01096, ecapa_loss=0.0001765, whisper_loss=0.09262, over 3906791.21 frames. ], batch size: 91, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:24:51,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1678110.0, ans=0.0 2024-08-12 14:24:54,527 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 14:24:59,443 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.506e+01 2.766e+01 3.211e+01 4.644e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 14:25:11,380 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 14:25:16,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1678310.0, ans=0.1 2024-08-12 14:25:17,621 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 14:25:26,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1678310.0, ans=0.2 2024-08-12 14:25:32,132 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-08-12 14:25:40,350 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 14:25:57,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1678510.0, ans=0.125 2024-08-12 14:25:59,845 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 14:26:03,006 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8450, loss[loss=0.09755, beats_loss=0.01095, ecapa_loss=0.0001997, whisper_loss=0.0846, over 20246.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01091, ecapa_loss=0.0001756, whisper_loss=0.09298, over 3906127.30 frames. ], batch size: 84, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:26:03,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1678610.0, ans=0.2 2024-08-12 14:26:24,787 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 14:26:41,642 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.88 vs. limit=22.5 2024-08-12 14:26:45,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1678810.0, ans=0.1 2024-08-12 14:26:49,840 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.36 vs. limit=10.0 2024-08-12 14:26:51,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.59 vs. limit=15.0 2024-08-12 14:26:59,324 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=22.5 2024-08-12 14:27:12,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1679010.0, ans=0.125 2024-08-12 14:27:14,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1679010.0, ans=0.125 2024-08-12 14:27:20,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1679010.0, ans=0.2 2024-08-12 14:27:24,378 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8500, loss[loss=0.09336, beats_loss=0.0116, ecapa_loss=0.0001678, whisper_loss=0.08009, over 22535.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01096, ecapa_loss=0.0001766, whisper_loss=0.09255, over 3896299.25 frames. ], batch size: 93, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:27:26,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1679110.0, ans=0.125 2024-08-12 14:27:37,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1679110.0, ans=0.0 2024-08-12 14:27:44,828 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.527e+01 2.828e+01 3.185e+01 5.995e+01, threshold=5.655e+01, percent-clipped=1.0 2024-08-12 14:27:47,373 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 14:27:53,719 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 14:28:03,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1679310.0, ans=0.125 2024-08-12 14:28:18,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1679410.0, ans=0.125 2024-08-12 14:28:22,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1679410.0, ans=0.125 2024-08-12 14:28:23,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1679410.0, ans=0.2 2024-08-12 14:28:36,748 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.33 vs. limit=12.0 2024-08-12 14:28:46,109 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.99 vs. limit=10.0 2024-08-12 14:28:55,235 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8550, loss[loss=0.1114, beats_loss=0.01097, ecapa_loss=0.0001503, whisper_loss=0.09893, over 20327.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01098, ecapa_loss=0.0001767, whisper_loss=0.09182, over 3863693.29 frames. ], batch size: 80, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:28:56,148 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2024-08-12 14:29:16,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1679710.0, ans=0.125 2024-08-12 14:29:18,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1679710.0, ans=0.0 2024-08-12 14:29:21,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1679710.0, ans=0.125 2024-08-12 14:29:32,604 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 14:29:38,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1679810.0, ans=0.125 2024-08-12 14:30:26,538 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.60 vs. limit=22.5 2024-08-12 14:30:32,426 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8600, loss[loss=0.1044, beats_loss=0.01179, ecapa_loss=0.0001506, whisper_loss=0.09111, over 19940.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.011, ecapa_loss=0.0001765, whisper_loss=0.09253, over 3880154.23 frames. ], batch size: 78, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:30:55,542 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.576e+01 2.836e+01 3.188e+01 4.951e+01, threshold=5.672e+01, percent-clipped=0.0 2024-08-12 14:30:55,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1680210.0, ans=0.125 2024-08-12 14:31:17,119 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 14:31:25,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1680310.0, ans=0.125 2024-08-12 14:31:25,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1680310.0, ans=0.125 2024-08-12 14:31:30,790 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.569e-02 2024-08-12 14:31:48,852 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.69 vs. limit=10.0 2024-08-12 14:31:49,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1680510.0, ans=0.1 2024-08-12 14:31:54,933 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8650, loss[loss=0.1127, beats_loss=0.01062, ecapa_loss=0.0002143, whisper_loss=0.09998, over 12776.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01103, ecapa_loss=0.0001758, whisper_loss=0.09207, over 3869440.82 frames. ], batch size: 53, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:32:01,182 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2024-08-12 14:32:16,192 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.81 vs. limit=22.5 2024-08-12 14:32:18,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1680710.0, ans=0.2 2024-08-12 14:32:20,952 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 14:32:21,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1680710.0, ans=0.0 2024-08-12 14:32:24,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=1680810.0, ans=0.02 2024-08-12 14:32:47,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1680910.0, ans=0.125 2024-08-12 14:32:53,245 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 31 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 14:33:02,948 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-12 14:33:07,978 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8700, loss[loss=0.1065, beats_loss=0.01003, ecapa_loss=0.0001963, whisper_loss=0.09455, over 17725.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01104, ecapa_loss=0.0001775, whisper_loss=0.09204, over 3899642.40 frames. ], batch size: 72, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:33:12,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1681110.0, ans=0.125 2024-08-12 14:33:25,843 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.512e+01 2.777e+01 3.126e+01 4.363e+01, threshold=5.553e+01, percent-clipped=0.0 2024-08-12 14:33:32,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1681210.0, ans=0.125 2024-08-12 14:33:34,955 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 14:33:35,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1681210.0, ans=0.125 2024-08-12 14:33:39,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.11 vs. limit=15.0 2024-08-12 14:33:46,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1681310.0, ans=0.125 2024-08-12 14:33:53,381 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.24 vs. limit=15.0 2024-08-12 14:34:01,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1681410.0, ans=0.125 2024-08-12 14:34:04,168 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 14:34:12,152 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2024-08-12 14:34:21,463 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8750, loss[loss=0.117, beats_loss=0.0109, ecapa_loss=0.000164, whisper_loss=0.1045, over 21220.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01096, ecapa_loss=0.0001768, whisper_loss=0.09216, over 3884598.86 frames. ], batch size: 83, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:34:29,082 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.67 vs. limit=22.5 2024-08-12 14:34:50,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1681810.0, ans=0.0 2024-08-12 14:34:50,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.59 vs. limit=22.5 2024-08-12 14:34:55,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.31 vs. limit=12.0 2024-08-12 14:35:16,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.04 vs. limit=6.0 2024-08-12 14:35:33,927 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8800, loss[loss=0.08178, beats_loss=0.009303, ecapa_loss=0.0002046, whisper_loss=0.07043, over 14753.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01098, ecapa_loss=0.0001759, whisper_loss=0.09198, over 3842592.25 frames. ], batch size: 59, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:35:37,427 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 14:35:37,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1682110.0, ans=0.125 2024-08-12 14:35:41,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1682110.0, ans=0.0 2024-08-12 14:35:53,776 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.570e+01 2.828e+01 3.387e+01 1.190e+02, threshold=5.656e+01, percent-clipped=1.0 2024-08-12 14:35:59,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1682210.0, ans=0.0 2024-08-12 14:36:25,431 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2024-08-12 14:36:31,908 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-12 14:36:45,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1682510.0, ans=0.125 2024-08-12 14:36:56,324 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8850, loss[loss=0.08361, beats_loss=0.01192, ecapa_loss=0.0001926, whisper_loss=0.06976, over 17656.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01109, ecapa_loss=0.0001741, whisper_loss=0.09147, over 3858573.64 frames. ], batch size: 76, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:36:56,821 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 33 from Vox, 25 fro AS 2024-08-12 14:37:28,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1682810.0, ans=0.0 2024-08-12 14:37:32,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1682810.0, ans=0.1 2024-08-12 14:37:41,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1682910.0, ans=0.1 2024-08-12 14:37:44,712 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-12 14:37:48,126 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 14:37:48,787 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.05 vs. limit=10.0 2024-08-12 14:37:55,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1682910.0, ans=0.125 2024-08-12 14:38:00,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1683010.0, ans=0.1 2024-08-12 14:38:08,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1683010.0, ans=0.125 2024-08-12 14:38:09,838 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 14:38:12,045 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2024-08-12 14:38:16,495 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8900, loss[loss=0.1043, beats_loss=0.01547, ecapa_loss=0.00014, whisper_loss=0.08743, over 18364.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01119, ecapa_loss=0.0001732, whisper_loss=0.09095, over 3844836.68 frames. ], batch size: 73, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:38:26,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1683110.0, ans=22.5 2024-08-12 14:38:37,391 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.454e+01 2.719e+01 3.172e+01 4.928e+01, threshold=5.438e+01, percent-clipped=0.0 2024-08-12 14:38:43,171 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-12 14:39:04,187 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 14:39:07,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1683410.0, ans=0.1 2024-08-12 14:39:07,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1683410.0, ans=0.125 2024-08-12 14:39:29,508 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-12 14:39:34,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1683510.0, ans=0.0 2024-08-12 14:39:34,827 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.20 vs. limit=15.0 2024-08-12 14:39:38,678 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 8950, loss[loss=0.1397, beats_loss=0.008942, ecapa_loss=0.0002024, whisper_loss=0.1288, over 23681.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01123, ecapa_loss=0.0001727, whisper_loss=0.09118, over 3867356.88 frames. ], batch size: 91, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:39:43,200 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-12 14:39:50,149 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 14:40:02,018 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 14:40:02,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1683710.0, ans=0.0 2024-08-12 14:40:12,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1683810.0, ans=0.0 2024-08-12 14:40:14,209 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 14:40:22,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1683810.0, ans=0.125 2024-08-12 14:40:28,393 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 13 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 14:40:39,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1683910.0, ans=0.125 2024-08-12 14:40:40,034 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2024-08-12 14:40:59,033 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9000, loss[loss=0.1068, beats_loss=0.01147, ecapa_loss=0.0001592, whisper_loss=0.09371, over 18740.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01112, ecapa_loss=0.0001738, whisper_loss=0.09181, over 3891075.68 frames. ], batch size: 73, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:40:59,033 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 14:41:38,357 INFO [train_multi_KD3.py:1149] (2/4) Epoch 12, validation on ASR_libri: loss=0.2545, beats_loss=0, ecapa_loss=0.000585, whisper_loss=0.2487, over 922467.00 frames. 2024-08-12 14:41:44,955 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.0570, 1.7773, 2.1382, 1.2542], device='cuda:2') 2024-08-12 14:41:57,488 INFO [train_multi_KD3.py:1149] (2/4) Epoch 12, validation on SV_voxceleb1: loss=0.004785, beats_loss=0, ecapa_loss=0.0004785, whisper_loss=0, over 939242.00 frames. 2024-08-12 14:43:56,671 INFO [train_multi_KD3.py:1149] (2/4) Epoch 12, validation on AT_audioset: loss=0.02422, beats_loss=0.02422, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 14:43:56,675 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 14:43:59,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1684110.0, ans=0.125 2024-08-12 14:44:15,051 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.474e+01 2.766e+01 3.028e+01 3.985e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 14:44:25,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1684210.0, ans=0.125 2024-08-12 14:44:32,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1684310.0, ans=0.1 2024-08-12 14:44:32,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1684310.0, ans=0.125 2024-08-12 14:44:34,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1684310.0, ans=0.09899494936611666 2024-08-12 14:44:35,946 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 14:44:38,796 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 14:44:45,742 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 15 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 14:45:03,272 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 14:45:09,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1684510.0, ans=0.125 2024-08-12 14:45:15,211 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9050, loss[loss=0.131, beats_loss=0.01059, ecapa_loss=0.0001802, whisper_loss=0.1186, over 22711.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01104, ecapa_loss=0.0001736, whisper_loss=0.09259, over 3878592.44 frames. ], batch size: 89, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:45:30,730 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 14:45:31,283 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.59 vs. limit=22.5 2024-08-12 14:45:51,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1684810.0, ans=0.125 2024-08-12 14:45:56,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-08-12 14:46:18,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.00 vs. limit=10.0 2024-08-12 14:46:18,859 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 14:46:33,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1685110.0, ans=0.125 2024-08-12 14:46:35,121 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9100, loss[loss=0.09391, beats_loss=0.0115, ecapa_loss=0.0001672, whisper_loss=0.08074, over 22213.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01105, ecapa_loss=0.0001743, whisper_loss=0.09259, over 3885740.32 frames. ], batch size: 94, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:46:37,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1685110.0, ans=0.125 2024-08-12 14:46:39,669 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 19 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 14:46:52,764 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.564e+01 2.836e+01 3.271e+01 5.149e+01, threshold=5.673e+01, percent-clipped=0.0 2024-08-12 14:46:55,216 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=12.0 2024-08-12 14:47:30,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1685410.0, ans=0.125 2024-08-12 14:47:31,905 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-08-12 14:47:48,450 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 14:47:51,596 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9150, loss[loss=0.122, beats_loss=0.009345, ecapa_loss=0.0001923, whisper_loss=0.1107, over 22312.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01098, ecapa_loss=0.0001747, whisper_loss=0.09263, over 3896195.79 frames. ], batch size: 89, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:48:21,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1685810.0, ans=0.0 2024-08-12 14:48:33,519 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 14:48:34,824 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 14:48:58,832 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 14:49:06,736 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9200, loss[loss=0.05564, beats_loss=0.01331, ecapa_loss=0.0001513, whisper_loss=0.04082, over 15592.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01104, ecapa_loss=0.0001747, whisper_loss=0.09137, over 3895962.41 frames. ], batch size: 64, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:49:23,473 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.540e+01 2.969e+01 3.284e+01 5.041e+01, threshold=5.938e+01, percent-clipped=0.0 2024-08-12 14:49:31,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1686210.0, ans=0.0 2024-08-12 14:49:54,097 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 14:49:54,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1686410.0, ans=0.1 2024-08-12 14:49:55,988 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-12 14:50:07,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1686410.0, ans=0.09899494936611666 2024-08-12 14:50:24,962 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9250, loss[loss=0.1001, beats_loss=0.01212, ecapa_loss=0.0002025, whisper_loss=0.08596, over 21051.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01096, ecapa_loss=0.0001767, whisper_loss=0.09218, over 3922794.41 frames. ], batch size: 89, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:50:29,092 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.59 vs. limit=22.5 2024-08-12 14:50:40,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1686710.0, ans=0.0 2024-08-12 14:50:48,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1686710.0, ans=0.125 2024-08-12 14:51:27,113 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 14:51:32,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1687010.0, ans=0.125 2024-08-12 14:51:39,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1687010.0, ans=0.1 2024-08-12 14:51:49,230 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9300, loss[loss=0.1037, beats_loss=0.01173, ecapa_loss=0.000197, whisper_loss=0.08996, over 21717.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01096, ecapa_loss=0.0001763, whisper_loss=0.09259, over 3917268.67 frames. ], batch size: 91, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:51:50,572 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 32 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 14:51:54,583 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 14:52:09,964 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.511e+01 2.773e+01 3.215e+01 9.080e+01, threshold=5.546e+01, percent-clipped=1.0 2024-08-12 14:52:37,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1687310.0, ans=0.5 2024-08-12 14:53:06,167 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 14:53:14,201 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9350, loss[loss=0.1003, beats_loss=0.009243, ecapa_loss=0.0002137, whisper_loss=0.08896, over 19805.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01091, ecapa_loss=0.0001775, whisper_loss=0.09223, over 3896395.94 frames. ], batch size: 81, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:53:14,399 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 18 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 14:53:14,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1687610.0, ans=0.1 2024-08-12 14:53:19,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1687610.0, ans=0.0 2024-08-12 14:53:26,397 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 15 from Vox, 51 fro AS 2024-08-12 14:53:26,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1687610.0, ans=0.125 2024-08-12 14:53:43,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1687710.0, ans=0.0 2024-08-12 14:54:04,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1687810.0, ans=0.0 2024-08-12 14:54:21,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1687910.0, ans=0.125 2024-08-12 14:54:30,296 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=15.0 2024-08-12 14:54:49,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1688010.0, ans=0.125 2024-08-12 14:54:52,090 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9400, loss[loss=0.1277, beats_loss=0.008686, ecapa_loss=0.0001924, whisper_loss=0.1171, over 22983.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01089, ecapa_loss=0.0001762, whisper_loss=0.09256, over 3915877.14 frames. ], batch size: 90, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:55:10,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1688210.0, ans=0.125 2024-08-12 14:55:18,029 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.357e+01 2.577e+01 2.940e+01 4.355e+01, threshold=5.154e+01, percent-clipped=0.0 2024-08-12 14:55:22,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1688210.0, ans=0.0 2024-08-12 14:55:25,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1688210.0, ans=0.0 2024-08-12 14:55:42,818 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 23 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-12 14:56:29,032 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9450, loss[loss=0.1044, beats_loss=0.01054, ecapa_loss=0.0001792, whisper_loss=0.09208, over 22177.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01089, ecapa_loss=0.0001756, whisper_loss=0.09245, over 3892972.71 frames. ], batch size: 88, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:57:01,013 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2024-08-12 14:57:08,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1688810.0, ans=0.2 2024-08-12 14:57:21,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1688810.0, ans=0.125 2024-08-12 14:57:26,180 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09947884827852249, model_norm_threshold=51.535552978515625 2024-08-12 14:57:26,374 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.99, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.656e+05, grad_sumsq=2.952e+04, orig_rms_sq=8.999e+00 2024-08-12 14:57:32,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.66 vs. limit=10.0 2024-08-12 14:57:37,289 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 14:57:55,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1689010.0, ans=0.0 2024-08-12 14:58:02,634 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9500, loss[loss=0.09281, beats_loss=0.01014, ecapa_loss=0.0002199, whisper_loss=0.08047, over 21031.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01102, ecapa_loss=0.0001752, whisper_loss=0.0913, over 3907980.85 frames. ], batch size: 88, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:58:16,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1689110.0, ans=0.125 2024-08-12 14:58:20,661 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.59 vs. limit=15.0 2024-08-12 14:58:24,352 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 16 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 14:58:25,252 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.024e+01 2.537e+01 2.807e+01 3.213e+01 5.181e+02, threshold=5.615e+01, percent-clipped=1.0 2024-08-12 14:58:30,945 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 14:58:57,924 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2024-08-12 14:59:02,622 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-12 14:59:05,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1689410.0, ans=0.125 2024-08-12 14:59:06,171 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 14:59:10,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1689410.0, ans=0.1 2024-08-12 14:59:26,871 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9550, loss[loss=0.09862, beats_loss=0.0101, ecapa_loss=0.0001681, whisper_loss=0.08683, over 19128.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01094, ecapa_loss=0.0001753, whisper_loss=0.09178, over 3874334.28 frames. ], batch size: 75, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:59:30,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1689610.0, ans=0.1 2024-08-12 14:59:31,347 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-12 14:59:42,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1689710.0, ans=0.0 2024-08-12 14:59:43,319 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 15:00:04,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1689810.0, ans=0.1 2024-08-12 15:00:11,096 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 9 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 15:00:14,966 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.77 vs. limit=15.0 2024-08-12 15:00:31,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1690010.0, ans=0.09899494936611666 2024-08-12 15:00:32,032 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 15:00:35,970 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9600, loss[loss=0.1129, beats_loss=0.01122, ecapa_loss=0.0002577, whisper_loss=0.09912, over 14323.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01094, ecapa_loss=0.0001744, whisper_loss=0.09202, over 3868879.51 frames. ], batch size: 61, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:00:45,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1690110.0, ans=0.0 2024-08-12 15:00:53,343 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.591e+01 2.857e+01 3.252e+01 5.691e+01, threshold=5.714e+01, percent-clipped=2.0 2024-08-12 15:00:55,320 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.35 vs. limit=10.0 2024-08-12 15:00:57,709 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 15:00:58,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1690210.0, ans=0.125 2024-08-12 15:01:11,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1690310.0, ans=0.0 2024-08-12 15:01:15,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1690410.0, ans=0.1 2024-08-12 15:01:16,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1690410.0, ans=0.125 2024-08-12 15:01:23,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1690410.0, ans=0.2 2024-08-12 15:01:31,003 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2024-08-12 15:01:44,265 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9650, loss[loss=0.1066, beats_loss=0.009845, ecapa_loss=0.00016, whisper_loss=0.09519, over 17191.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0109, ecapa_loss=0.000174, whisper_loss=0.09206, over 3860319.32 frames. ], batch size: 66, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:01:58,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1690710.0, ans=0.0 2024-08-12 15:02:02,809 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 15:02:04,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1690710.0, ans=0.125 2024-08-12 15:02:06,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1690710.0, ans=0.125 2024-08-12 15:02:29,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1690910.0, ans=0.0 2024-08-12 15:02:33,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1690910.0, ans=0.125 2024-08-12 15:02:41,353 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.28 vs. limit=15.0 2024-08-12 15:02:43,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1691010.0, ans=0.125 2024-08-12 15:02:50,379 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 15:02:50,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1691010.0, ans=0.0 2024-08-12 15:02:51,572 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 15:02:52,996 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9700, loss[loss=0.09771, beats_loss=0.01013, ecapa_loss=0.0001722, whisper_loss=0.08586, over 22491.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01084, ecapa_loss=0.0001753, whisper_loss=0.09233, over 3869690.83 frames. ], batch size: 92, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:03:10,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1691210.0, ans=0.2 2024-08-12 15:03:10,782 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.535e+01 2.821e+01 3.429e+01 6.519e+01, threshold=5.641e+01, percent-clipped=1.0 2024-08-12 15:03:10,946 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 15:03:16,151 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.86 vs. limit=15.0 2024-08-12 15:03:22,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1691310.0, ans=10.0 2024-08-12 15:03:26,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1691310.0, ans=0.125 2024-08-12 15:03:43,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1691410.0, ans=0.0 2024-08-12 15:04:04,248 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9750, loss[loss=0.1043, beats_loss=0.009585, ecapa_loss=0.0001909, whisper_loss=0.09283, over 16247.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01086, ecapa_loss=0.0001744, whisper_loss=0.0918, over 3857643.95 frames. ], batch size: 62, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:04:05,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1691610.0, ans=0.0 2024-08-12 15:04:07,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1691610.0, ans=0.0 2024-08-12 15:04:17,501 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.19 vs. limit=10.0 2024-08-12 15:04:23,642 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 15:04:29,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1691710.0, ans=10.0 2024-08-12 15:04:56,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1691910.0, ans=0.125 2024-08-12 15:04:59,157 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 14 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-12 15:05:01,595 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 15:05:07,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1692010.0, ans=0.125 2024-08-12 15:05:12,533 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9800, loss[loss=0.1089, beats_loss=0.00969, ecapa_loss=0.000193, whisper_loss=0.09726, over 23048.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.0001746, whisper_loss=0.09148, over 3867249.01 frames. ], batch size: 90, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:05:20,580 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 15:05:22,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1692110.0, ans=0.1 2024-08-12 15:05:30,090 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.561e+01 2.818e+01 3.285e+01 1.389e+02, threshold=5.636e+01, percent-clipped=4.0 2024-08-12 15:05:41,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1692310.0, ans=0.2 2024-08-12 15:05:52,486 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-12 15:05:56,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1692410.0, ans=0.2 2024-08-12 15:06:02,897 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 15:06:17,029 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 15:06:19,295 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9850, loss[loss=0.1164, beats_loss=0.009958, ecapa_loss=0.0001931, whisper_loss=0.1045, over 21517.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01085, ecapa_loss=0.0001752, whisper_loss=0.09245, over 3887742.03 frames. ], batch size: 90, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:06:29,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1692610.0, ans=0.125 2024-08-12 15:06:33,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1692710.0, ans=0.0 2024-08-12 15:06:36,308 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 15:06:54,386 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 20 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 15:06:54,979 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2024-08-12 15:07:13,015 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.83 vs. limit=15.0 2024-08-12 15:07:15,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1693010.0, ans=0.125 2024-08-12 15:07:15,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1693010.0, ans=0.125 2024-08-12 15:07:16,515 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 15:07:28,319 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9900, loss[loss=0.1125, beats_loss=0.008844, ecapa_loss=0.0001788, whisper_loss=0.1019, over 17534.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0109, ecapa_loss=0.0001754, whisper_loss=0.09248, over 3912673.06 frames. ], batch size: 67, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:07:40,148 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 15:07:46,541 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.533e+01 2.789e+01 3.190e+01 6.872e+01, threshold=5.578e+01, percent-clipped=1.0 2024-08-12 15:07:51,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1693210.0, ans=0.125 2024-08-12 15:07:55,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1693310.0, ans=0.125 2024-08-12 15:07:59,489 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-12 15:08:19,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1693410.0, ans=0.125 2024-08-12 15:08:27,802 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=15.0 2024-08-12 15:08:34,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1693510.0, ans=0.1 2024-08-12 15:08:38,182 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.46 vs. limit=15.0 2024-08-12 15:08:38,421 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 9950, loss[loss=0.1084, beats_loss=0.0122, ecapa_loss=0.0002174, whisper_loss=0.09407, over 20440.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01094, ecapa_loss=0.0001746, whisper_loss=0.09229, over 3904751.93 frames. ], batch size: 87, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:08:38,674 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 32 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 15:08:40,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1693610.0, ans=0.0 2024-08-12 15:08:51,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1693710.0, ans=0.125 2024-08-12 15:08:59,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1693710.0, ans=0.04949747468305833 2024-08-12 15:09:16,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1693810.0, ans=0.0 2024-08-12 15:09:18,253 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.94 vs. limit=15.0 2024-08-12 15:09:22,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1693910.0, ans=0.1 2024-08-12 15:09:36,894 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 36 from Vox, 30 fro AS 2024-08-12 15:09:54,012 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10000, loss[loss=0.1233, beats_loss=0.009435, ecapa_loss=0.0001243, whisper_loss=0.1126, over 17580.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01092, ecapa_loss=0.0001754, whisper_loss=0.0926, over 3893378.26 frames. ], batch size: 64, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:09:56,820 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-12 15:09:58,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1694110.0, ans=0.125 2024-08-12 15:10:11,394 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.583e+01 2.831e+01 3.339e+01 3.966e+02, threshold=5.663e+01, percent-clipped=2.0 2024-08-12 15:10:13,208 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 15:10:21,628 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 15:10:22,305 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=15.0 2024-08-12 15:10:41,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1694410.0, ans=0.125 2024-08-12 15:10:50,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1694510.0, ans=0.0 2024-08-12 15:10:59,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1694510.0, ans=0.125 2024-08-12 15:11:01,721 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10050, loss[loss=0.1069, beats_loss=0.01068, ecapa_loss=0.0001845, whisper_loss=0.09438, over 21442.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01088, ecapa_loss=0.000177, whisper_loss=0.09245, over 3900461.60 frames. ], batch size: 88, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:11:02,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1694610.0, ans=10.0 2024-08-12 15:11:03,322 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 27 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-12 15:11:10,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1694610.0, ans=0.1 2024-08-12 15:11:18,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1694710.0, ans=0.1 2024-08-12 15:11:22,145 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 15:11:23,704 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 15:11:27,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1694710.0, ans=0.125 2024-08-12 15:12:14,398 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10100, loss[loss=0.1411, beats_loss=0.008005, ecapa_loss=0.0002045, whisper_loss=0.131, over 19575.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01092, ecapa_loss=0.0001769, whisper_loss=0.09221, over 3916972.62 frames. ], batch size: 78, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:12:14,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1695110.0, ans=0.0 2024-08-12 15:12:19,659 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 15:12:33,723 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.463e+01 2.716e+01 3.042e+01 6.161e+01, threshold=5.433e+01, percent-clipped=3.0 2024-08-12 15:12:49,565 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-12 15:13:18,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1695510.0, ans=0.125 2024-08-12 15:13:21,202 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 15:13:21,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1695510.0, ans=0.0 2024-08-12 15:13:25,795 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.43 vs. limit=15.0 2024-08-12 15:13:28,881 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10150, loss[loss=0.09692, beats_loss=0.01281, ecapa_loss=0.0002011, whisper_loss=0.0821, over 20980.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01092, ecapa_loss=0.0001774, whisper_loss=0.09223, over 3915237.89 frames. ], batch size: 92, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:13:36,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1695610.0, ans=0.1 2024-08-12 15:13:36,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1695610.0, ans=0.125 2024-08-12 15:13:53,406 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-12 15:13:58,682 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 15:14:01,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1695810.0, ans=0.125 2024-08-12 15:14:08,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1695910.0, ans=0.125 2024-08-12 15:14:13,093 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2024-08-12 15:14:14,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1695910.0, ans=0.035 2024-08-12 15:14:36,533 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10200, loss[loss=0.07813, beats_loss=0.01305, ecapa_loss=0.0001845, whisper_loss=0.06323, over 19183.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01082, ecapa_loss=0.0001783, whisper_loss=0.09263, over 3909255.84 frames. ], batch size: 80, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:14:46,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1696110.0, ans=0.0 2024-08-12 15:14:54,392 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.514e+01 2.832e+01 3.281e+01 6.809e+01, threshold=5.664e+01, percent-clipped=1.0 2024-08-12 15:14:54,946 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 15:15:03,668 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.88 vs. limit=15.0 2024-08-12 15:15:14,129 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-12 15:15:46,087 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10250, loss[loss=0.1073, beats_loss=0.01174, ecapa_loss=0.0001616, whisper_loss=0.09398, over 23147.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01084, ecapa_loss=0.0001782, whisper_loss=0.09318, over 3944690.59 frames. ], batch size: 91, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:15:49,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1696610.0, ans=0.125 2024-08-12 15:16:04,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1696710.0, ans=0.0 2024-08-12 15:16:07,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1696710.0, ans=0.125 2024-08-12 15:16:15,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1696810.0, ans=0.0 2024-08-12 15:16:43,820 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 33 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 15:16:57,059 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10300, loss[loss=0.09802, beats_loss=0.01115, ecapa_loss=0.0001768, whisper_loss=0.0851, over 21267.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01096, ecapa_loss=0.0001775, whisper_loss=0.09239, over 3907566.08 frames. ], batch size: 88, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:17:04,973 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 15:17:16,463 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.570e+01 2.801e+01 3.230e+01 4.716e+01, threshold=5.603e+01, percent-clipped=0.0 2024-08-12 15:17:38,701 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 15:17:48,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1697410.0, ans=0.1 2024-08-12 15:17:52,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1697410.0, ans=0.125 2024-08-12 15:18:09,642 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10350, loss[loss=0.1057, beats_loss=0.01146, ecapa_loss=0.0001351, whisper_loss=0.09285, over 17293.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01104, ecapa_loss=0.0001762, whisper_loss=0.09229, over 3917656.51 frames. ], batch size: 66, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:18:34,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1697710.0, ans=0.125 2024-08-12 15:18:45,692 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.65 vs. limit=15.0 2024-08-12 15:19:02,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1697910.0, ans=0.07 2024-08-12 15:19:02,932 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 33 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 15:19:03,522 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=12.0 2024-08-12 15:19:17,180 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10400, loss[loss=0.09946, beats_loss=0.01041, ecapa_loss=0.0001866, whisper_loss=0.08719, over 20181.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01101, ecapa_loss=0.000176, whisper_loss=0.09192, over 3935913.76 frames. ], batch size: 81, lr: 5.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:19:17,909 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.52 vs. limit=22.5 2024-08-12 15:19:22,879 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 15:19:24,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1698110.0, ans=0.0 2024-08-12 15:19:30,118 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-12 15:19:30,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1698210.0, ans=0.0 2024-08-12 15:19:35,332 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.431e+01 2.766e+01 3.090e+01 4.882e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 15:19:45,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1698310.0, ans=0.0 2024-08-12 15:19:47,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1698310.0, ans=0.1 2024-08-12 15:19:56,357 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 12 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 15:19:59,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1698410.0, ans=0.125 2024-08-12 15:20:07,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1698410.0, ans=0.015 2024-08-12 15:20:08,540 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-12 15:20:19,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1698510.0, ans=0.0 2024-08-12 15:20:21,718 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 15:20:24,523 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10450, loss[loss=0.1066, beats_loss=0.012, ecapa_loss=0.0002183, whisper_loss=0.09244, over 20731.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01095, ecapa_loss=0.0001778, whisper_loss=0.09206, over 3913392.77 frames. ], batch size: 93, lr: 5.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:20:52,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1698810.0, ans=0.125 2024-08-12 15:21:07,338 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 15:21:18,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1699010.0, ans=0.125 2024-08-12 15:21:30,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1699010.0, ans=0.125 2024-08-12 15:21:32,875 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10500, loss[loss=0.09898, beats_loss=0.0124, ecapa_loss=0.0001413, whisper_loss=0.08517, over 22876.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01093, ecapa_loss=0.000178, whisper_loss=0.09182, over 3906871.42 frames. ], batch size: 90, lr: 5.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:21:39,666 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-12 15:21:41,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1699110.0, ans=0.125 2024-08-12 15:21:43,921 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 15:21:50,590 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.539e+01 2.734e+01 3.108e+01 4.878e+01, threshold=5.468e+01, percent-clipped=0.0 2024-08-12 15:21:50,822 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 15:21:52,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1699210.0, ans=0.5 2024-08-12 15:22:08,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1699310.0, ans=0.0 2024-08-12 15:22:15,413 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 15:22:21,981 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 8 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-12 15:22:22,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1699410.0, ans=0.0 2024-08-12 15:22:24,662 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-12 15:22:40,597 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10550, loss[loss=0.138, beats_loss=0.007213, ecapa_loss=0.000211, whisper_loss=0.1287, over 18192.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0109, ecapa_loss=0.0001768, whisper_loss=0.09195, over 3877116.83 frames. ], batch size: 71, lr: 5.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:22:58,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1699710.0, ans=0.2 2024-08-12 15:23:03,255 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 15:23:33,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1699910.0, ans=0.125 2024-08-12 15:23:35,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1699910.0, ans=0.125 2024-08-12 15:23:48,793 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 15:23:52,954 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10600, loss[loss=0.1147, beats_loss=0.008173, ecapa_loss=0.0002018, whisper_loss=0.1045, over 18798.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01086, ecapa_loss=0.000178, whisper_loss=0.09197, over 3885335.43 frames. ], batch size: 70, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:23:53,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1700110.0, ans=0.125 2024-08-12 15:24:06,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1700210.0, ans=0.09899494936611666 2024-08-12 15:24:13,290 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.487e+01 2.727e+01 3.054e+01 5.238e+01, threshold=5.453e+01, percent-clipped=0.0 2024-08-12 15:24:21,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1700310.0, ans=0.125 2024-08-12 15:24:30,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1700310.0, ans=0.125 2024-08-12 15:24:40,603 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 15:24:52,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1700510.0, ans=0.125 2024-08-12 15:25:07,458 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10650, loss[loss=0.1103, beats_loss=0.01191, ecapa_loss=0.0001509, whisper_loss=0.09688, over 14626.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0109, ecapa_loss=0.0001765, whisper_loss=0.09171, over 3880486.95 frames. ], batch size: 58, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:25:16,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1700610.0, ans=0.125 2024-08-12 15:25:20,669 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.332e+05 2024-08-12 15:25:33,808 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 28 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-12 15:25:34,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1700710.0, ans=0.125 2024-08-12 15:25:52,550 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2024-08-12 15:25:57,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1700910.0, ans=0.2 2024-08-12 15:26:20,356 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10700, loss[loss=0.1127, beats_loss=0.01035, ecapa_loss=0.0001688, whisper_loss=0.1007, over 17228.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01086, ecapa_loss=0.0001749, whisper_loss=0.09283, over 3896305.80 frames. ], batch size: 68, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:26:30,777 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.95 vs. limit=15.0 2024-08-12 15:26:31,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1701110.0, ans=0.2 2024-08-12 15:26:32,379 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-12 15:26:37,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1701210.0, ans=0.125 2024-08-12 15:26:39,603 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.524e+01 2.760e+01 3.145e+01 5.039e+01, threshold=5.520e+01, percent-clipped=0.0 2024-08-12 15:26:40,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1701210.0, ans=0.0 2024-08-12 15:27:06,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1701410.0, ans=0.0 2024-08-12 15:27:25,640 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 15:27:27,882 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10750, loss[loss=0.1195, beats_loss=0.008539, ecapa_loss=0.0001741, whisper_loss=0.1092, over 19305.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01093, ecapa_loss=0.0001748, whisper_loss=0.09285, over 3903860.42 frames. ], batch size: 77, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:27:46,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1701710.0, ans=0.0 2024-08-12 15:27:53,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1701810.0, ans=0.1 2024-08-12 15:28:27,860 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.86 vs. limit=22.5 2024-08-12 15:28:29,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1702010.0, ans=0.1 2024-08-12 15:28:29,256 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2024-08-12 15:28:33,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1702010.0, ans=0.025 2024-08-12 15:28:33,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1702010.0, ans=0.125 2024-08-12 15:28:35,254 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10800, loss[loss=0.1076, beats_loss=0.01035, ecapa_loss=0.0001884, whisper_loss=0.09532, over 22508.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01093, ecapa_loss=0.0001753, whisper_loss=0.09364, over 3901653.88 frames. ], batch size: 91, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:28:39,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1702110.0, ans=0.125 2024-08-12 15:28:41,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1702110.0, ans=0.125 2024-08-12 15:28:50,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1702210.0, ans=0.125 2024-08-12 15:28:54,373 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.536e+01 2.905e+01 3.267e+01 1.637e+02, threshold=5.810e+01, percent-clipped=2.0 2024-08-12 15:28:59,771 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 15:29:06,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1702310.0, ans=0.125 2024-08-12 15:29:17,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1702410.0, ans=0.09899494936611666 2024-08-12 15:29:17,691 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.01 vs. limit=10.0 2024-08-12 15:29:18,243 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-12 15:29:19,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1702410.0, ans=0.125 2024-08-12 15:29:23,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1702410.0, ans=0.125 2024-08-12 15:29:39,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1702510.0, ans=0.0 2024-08-12 15:29:42,186 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2024-08-12 15:29:42,644 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10850, loss[loss=0.1109, beats_loss=0.01128, ecapa_loss=0.0001769, whisper_loss=0.09788, over 23020.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01094, ecapa_loss=0.0001754, whisper_loss=0.09348, over 3921600.73 frames. ], batch size: 92, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:29:46,971 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 15:30:02,619 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 15:30:05,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1702710.0, ans=0.125 2024-08-12 15:30:06,149 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 15:30:14,244 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-12 15:30:33,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1702910.0, ans=0.125 2024-08-12 15:30:50,225 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10900, loss[loss=0.1144, beats_loss=0.008926, ecapa_loss=0.0001623, whisper_loss=0.1039, over 18475.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01091, ecapa_loss=0.0001746, whisper_loss=0.09353, over 3952552.12 frames. ], batch size: 72, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:31:01,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1703110.0, ans=0.2 2024-08-12 15:31:08,861 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.493e+01 2.855e+01 3.171e+01 4.648e+01, threshold=5.710e+01, percent-clipped=0.0 2024-08-12 15:31:13,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1703210.0, ans=0.125 2024-08-12 15:31:17,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1703310.0, ans=0.1 2024-08-12 15:31:23,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1703310.0, ans=0.125 2024-08-12 15:31:33,137 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 32 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 15:31:44,044 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-12 15:31:44,410 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.160e-01 2024-08-12 15:31:57,167 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 10950, loss[loss=0.104, beats_loss=0.01305, ecapa_loss=0.0001664, whisper_loss=0.08934, over 20838.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0109, ecapa_loss=0.0001747, whisper_loss=0.09353, over 3913989.44 frames. ], batch size: 85, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:31:57,380 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 15:32:01,546 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 15:32:01,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1703610.0, ans=0.0 2024-08-12 15:32:25,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1703810.0, ans=0.0 2024-08-12 15:32:46,642 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.04 vs. limit=15.0 2024-08-12 15:32:48,703 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 15:32:53,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1703910.0, ans=0.125 2024-08-12 15:32:54,286 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-12 15:33:13,214 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11000, loss[loss=0.1455, beats_loss=0.005461, ecapa_loss=0.0001749, whisper_loss=0.1382, over 18091.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01088, ecapa_loss=0.0001757, whisper_loss=0.09325, over 3916955.14 frames. ], batch size: 65, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:33:20,380 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 15:33:24,439 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 15:33:24,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1704110.0, ans=0.125 2024-08-12 15:33:27,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1704210.0, ans=0.0 2024-08-12 15:33:32,671 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.783e+01 2.453e+01 2.776e+01 3.261e+01 5.617e+01, threshold=5.552e+01, percent-clipped=0.0 2024-08-12 15:33:35,759 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 15:33:39,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1704310.0, ans=0.0 2024-08-12 15:33:49,284 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 15:33:50,275 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=12.0 2024-08-12 15:33:53,525 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-12 15:33:53,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1704410.0, ans=0.125 2024-08-12 15:33:55,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1704410.0, ans=0.125 2024-08-12 15:34:00,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1704410.0, ans=0.2 2024-08-12 15:34:03,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1704410.0, ans=0.1 2024-08-12 15:34:13,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1704510.0, ans=0.1 2024-08-12 15:34:14,846 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 15:34:17,271 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-08-12 15:34:17,843 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-12 15:34:21,531 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11050, loss[loss=0.0965, beats_loss=0.01341, ecapa_loss=0.0001735, whisper_loss=0.08135, over 21289.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01088, ecapa_loss=0.0001765, whisper_loss=0.0932, over 3943513.43 frames. ], batch size: 88, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:34:24,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1704610.0, ans=0.125 2024-08-12 15:34:28,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1704610.0, ans=0.2 2024-08-12 15:34:41,624 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 15:34:47,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1704810.0, ans=0.95 2024-08-12 15:34:48,938 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=12.0 2024-08-12 15:34:59,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1704810.0, ans=0.0 2024-08-12 15:35:00,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1704910.0, ans=0.125 2024-08-12 15:35:01,768 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 15:35:07,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1704910.0, ans=0.2 2024-08-12 15:35:09,018 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.79 vs. limit=15.0 2024-08-12 15:35:18,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1705010.0, ans=0.125 2024-08-12 15:35:26,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1705010.0, ans=0.125 2024-08-12 15:35:29,240 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11100, loss[loss=0.09829, beats_loss=0.0101, ecapa_loss=0.000186, whisper_loss=0.08633, over 16106.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01091, ecapa_loss=0.0001757, whisper_loss=0.09268, over 3916961.69 frames. ], batch size: 66, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:35:32,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1705110.0, ans=0.125 2024-08-12 15:35:34,809 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 15:35:36,280 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 15:35:45,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1705210.0, ans=0.0 2024-08-12 15:35:45,985 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-12 15:35:48,532 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.398e+01 2.655e+01 3.117e+01 6.342e+01, threshold=5.309e+01, percent-clipped=1.0 2024-08-12 15:35:50,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1705210.0, ans=0.2 2024-08-12 15:36:07,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1705310.0, ans=0.1 2024-08-12 15:36:07,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1705310.0, ans=0.125 2024-08-12 15:36:33,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1705510.0, ans=0.0 2024-08-12 15:36:34,601 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 15:36:34,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1705510.0, ans=0.1 2024-08-12 15:36:38,741 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11150, loss[loss=0.103, beats_loss=0.01075, ecapa_loss=0.0002072, whisper_loss=0.09016, over 14754.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01095, ecapa_loss=0.0001749, whisper_loss=0.09207, over 3868628.91 frames. ], batch size: 61, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:36:40,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1705610.0, ans=0.1 2024-08-12 15:37:17,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1705910.0, ans=0.125 2024-08-12 15:37:24,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1705910.0, ans=0.125 2024-08-12 15:37:27,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1705910.0, ans=0.125 2024-08-12 15:37:29,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1705910.0, ans=0.125 2024-08-12 15:37:33,920 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 15:37:46,074 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11200, loss[loss=0.1066, beats_loss=0.01056, ecapa_loss=0.0001426, whisper_loss=0.09462, over 16608.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01095, ecapa_loss=0.0001748, whisper_loss=0.09175, over 3885425.36 frames. ], batch size: 62, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:37:46,297 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 15:37:57,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1706110.0, ans=0.0 2024-08-12 15:38:05,489 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.490e+01 2.836e+01 3.047e+01 5.086e+01, threshold=5.671e+01, percent-clipped=0.0 2024-08-12 15:38:09,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-08-12 15:38:14,018 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2024-08-12 15:38:25,179 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.47 vs. limit=15.0 2024-08-12 15:38:27,111 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 15:38:29,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1706410.0, ans=0.0 2024-08-12 15:38:30,099 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.83 vs. limit=10.0 2024-08-12 15:38:44,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1706510.0, ans=0.2 2024-08-12 15:38:48,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1706510.0, ans=0.125 2024-08-12 15:38:53,734 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11250, loss[loss=0.1255, beats_loss=0.00801, ecapa_loss=0.00017, whisper_loss=0.1158, over 20194.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01093, ecapa_loss=0.000175, whisper_loss=0.09229, over 3877805.12 frames. ], batch size: 74, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:39:11,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1706710.0, ans=0.0 2024-08-12 15:39:15,518 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-12 15:39:17,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=1706710.0, ans=0.2 2024-08-12 15:39:17,596 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.50 vs. limit=22.5 2024-08-12 15:39:27,252 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2024-08-12 15:39:31,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1706810.0, ans=0.2 2024-08-12 15:39:41,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=15.0 2024-08-12 15:40:01,554 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11300, loss[loss=0.1027, beats_loss=0.0111, ecapa_loss=0.0001909, whisper_loss=0.08966, over 21619.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0109, ecapa_loss=0.0001748, whisper_loss=0.0926, over 3863870.52 frames. ], batch size: 92, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:40:20,378 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.567e+01 2.768e+01 3.157e+01 8.223e+01, threshold=5.536e+01, percent-clipped=2.0 2024-08-12 15:40:34,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1707310.0, ans=0.125 2024-08-12 15:41:06,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.24 vs. limit=10.0 2024-08-12 15:41:08,425 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-12 15:41:10,189 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11350, loss[loss=0.1176, beats_loss=0.0086, ecapa_loss=0.0001446, whisper_loss=0.1075, over 16348.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01088, ecapa_loss=0.0001751, whisper_loss=0.0927, over 3894854.82 frames. ], batch size: 57, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:41:11,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1707610.0, ans=0.2 2024-08-12 15:41:18,567 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.00 vs. limit=15.0 2024-08-12 15:41:20,579 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 15:41:27,019 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 40 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 15:41:34,156 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 15:41:34,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1707710.0, ans=0.125 2024-08-12 15:41:35,592 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-12 15:41:36,043 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.08 vs. limit=10.0 2024-08-12 15:41:52,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1707910.0, ans=0.2 2024-08-12 15:42:00,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1707910.0, ans=0.1 2024-08-12 15:42:02,913 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-12 15:42:17,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1708110.0, ans=0.0 2024-08-12 15:42:17,876 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11400, loss[loss=0.08345, beats_loss=0.01177, ecapa_loss=0.0001315, whisper_loss=0.07036, over 14462.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01085, ecapa_loss=0.0001752, whisper_loss=0.09298, over 3895619.59 frames. ], batch size: 57, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:42:22,055 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 15:42:27,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1708110.0, ans=0.2 2024-08-12 15:42:36,468 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.714e+01 3.019e+01 3.288e+01 4.590e+01, threshold=6.038e+01, percent-clipped=0.0 2024-08-12 15:42:41,044 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 15:42:45,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1708310.0, ans=0.125 2024-08-12 15:43:00,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1708410.0, ans=0.125 2024-08-12 15:43:01,793 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-12 15:43:05,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1708410.0, ans=0.125 2024-08-12 15:43:11,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1708510.0, ans=0.125 2024-08-12 15:43:22,079 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 15:43:23,627 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 15:43:25,901 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11450, loss[loss=0.09373, beats_loss=0.0121, ecapa_loss=0.0001522, whisper_loss=0.08011, over 22054.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01088, ecapa_loss=0.0001746, whisper_loss=0.09263, over 3897752.84 frames. ], batch size: 88, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:43:38,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.58 vs. limit=15.0 2024-08-12 15:43:39,131 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 24 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-12 15:43:43,647 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 15:43:46,121 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 15:43:49,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1708710.0, ans=0.0 2024-08-12 15:44:23,084 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2024-08-12 15:44:34,167 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11500, loss[loss=0.09026, beats_loss=0.0128, ecapa_loss=0.000159, whisper_loss=0.07586, over 22444.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0109, ecapa_loss=0.0001747, whisper_loss=0.09219, over 3900026.66 frames. ], batch size: 93, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:44:39,741 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-12 15:44:41,131 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 31 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-12 15:44:46,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1709110.0, ans=0.125 2024-08-12 15:44:54,107 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.425e+01 2.764e+01 3.070e+01 5.781e+01, threshold=5.529e+01, percent-clipped=0.0 2024-08-12 15:44:54,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1709210.0, ans=0.125 2024-08-12 15:45:08,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1709310.0, ans=0.1 2024-08-12 15:45:34,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1709510.0, ans=0.125 2024-08-12 15:45:35,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1709510.0, ans=0.2 2024-08-12 15:45:43,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1709510.0, ans=0.125 2024-08-12 15:45:47,345 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11550, loss[loss=0.1132, beats_loss=0.008322, ecapa_loss=0.0001449, whisper_loss=0.1034, over 18883.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01085, ecapa_loss=0.0001743, whisper_loss=0.09277, over 3897574.75 frames. ], batch size: 71, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:45:51,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1709610.0, ans=0.125 2024-08-12 15:45:52,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1709610.0, ans=0.125 2024-08-12 15:45:53,529 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 18 from Vox, 53 fro AS 2024-08-12 15:46:11,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1709810.0, ans=0.125 2024-08-12 15:46:26,840 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.91 vs. limit=15.0 2024-08-12 15:46:56,803 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-12 15:47:04,034 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11600, loss[loss=0.09859, beats_loss=0.01055, ecapa_loss=0.000199, whisper_loss=0.08605, over 18098.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01091, ecapa_loss=0.0001742, whisper_loss=0.09256, over 3925962.84 frames. ], batch size: 76, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:47:08,414 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 15:47:14,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1710110.0, ans=0.0 2024-08-12 15:47:17,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1710110.0, ans=0.125 2024-08-12 15:47:32,100 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.125e+01 2.592e+01 2.931e+01 3.257e+01 5.066e+01, threshold=5.862e+01, percent-clipped=0.0 2024-08-12 15:47:40,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1710210.0, ans=15.0 2024-08-12 15:48:10,129 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 25 from LS+wenet, 18 from Vox, 12 fro AS 2024-08-12 15:48:14,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1710410.0, ans=0.125 2024-08-12 15:48:21,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1710410.0, ans=0.1 2024-08-12 15:48:34,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1710510.0, ans=0.125 2024-08-12 15:48:36,372 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-12 15:48:51,752 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11650, loss[loss=0.09174, beats_loss=0.01214, ecapa_loss=0.0001601, whisper_loss=0.07799, over 21318.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01092, ecapa_loss=0.0001742, whisper_loss=0.09217, over 3907252.63 frames. ], batch size: 88, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:48:51,895 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 15:49:14,425 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 15:49:41,537 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.72 vs. limit=10.0 2024-08-12 15:50:10,951 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 15:50:35,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1710910.0, ans=0.0 2024-08-12 15:51:01,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1711010.0, ans=0.125 2024-08-12 15:51:06,335 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11700, loss[loss=0.1109, beats_loss=0.01132, ecapa_loss=0.000129, whisper_loss=0.09833, over 22482.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01101, ecapa_loss=0.0001748, whisper_loss=0.09202, over 3945212.00 frames. ], batch size: 84, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:51:41,825 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-12 15:51:45,604 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.679e+01 3.031e+01 3.384e+01 8.068e+01, threshold=6.063e+01, percent-clipped=1.0 2024-08-12 15:53:20,104 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11750, loss[loss=0.114, beats_loss=0.01157, ecapa_loss=0.0001622, whisper_loss=0.1008, over 22678.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01109, ecapa_loss=0.0001754, whisper_loss=0.09202, over 3955211.74 frames. ], batch size: 89, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:53:22,205 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.14 vs. limit=15.0 2024-08-12 15:53:30,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1711610.0, ans=0.1 2024-08-12 15:54:07,514 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 15:54:32,336 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 15:54:41,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1711910.0, ans=0.125 2024-08-12 15:54:48,645 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 31 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 15:54:51,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1712010.0, ans=0.0 2024-08-12 15:54:52,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1712010.0, ans=0.0 2024-08-12 15:55:02,314 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11800, loss[loss=0.1209, beats_loss=0.01066, ecapa_loss=0.0001644, whisper_loss=0.1086, over 16826.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0111, ecapa_loss=0.0001752, whisper_loss=0.09232, over 3956361.34 frames. ], batch size: 64, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:55:06,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-12 15:55:30,168 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.421e+01 2.823e+01 3.255e+01 8.063e+01, threshold=5.645e+01, percent-clipped=1.0 2024-08-12 15:55:44,845 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.572e+00 2024-08-12 15:55:59,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1712410.0, ans=0.0 2024-08-12 15:56:13,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1712510.0, ans=0.05 2024-08-12 15:56:31,273 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11850, loss[loss=0.1123, beats_loss=0.007964, ecapa_loss=0.0001824, whisper_loss=0.1025, over 15508.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01121, ecapa_loss=0.0001748, whisper_loss=0.09117, over 3936804.82 frames. ], batch size: 60, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:56:45,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1712610.0, ans=0.1 2024-08-12 15:57:00,623 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 15:57:10,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1712810.0, ans=0.1 2024-08-12 15:57:16,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1712810.0, ans=0.125 2024-08-12 15:57:23,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1712910.0, ans=0.125 2024-08-12 15:57:35,829 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 15:57:44,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1713010.0, ans=0.0 2024-08-12 15:57:51,734 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 15:57:58,483 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11900, loss[loss=0.107, beats_loss=0.01162, ecapa_loss=0.0001599, whisper_loss=0.09376, over 19992.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0112, ecapa_loss=0.0001742, whisper_loss=0.09163, over 3929299.03 frames. ], batch size: 81, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:58:06,348 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 15:58:24,747 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.471e+01 2.746e+01 3.069e+01 1.141e+02, threshold=5.492e+01, percent-clipped=1.0 2024-08-12 15:58:34,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1713310.0, ans=0.125 2024-08-12 15:58:34,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1713310.0, ans=0.1 2024-08-12 15:58:38,123 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.437e+01 2024-08-12 15:58:39,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1713310.0, ans=0.2 2024-08-12 15:58:44,417 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=15.0 2024-08-12 15:58:54,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1713410.0, ans=0.125 2024-08-12 15:58:59,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1713410.0, ans=0.0 2024-08-12 15:59:23,331 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.16 vs. limit=22.5 2024-08-12 15:59:24,024 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 11950, loss[loss=0.07979, beats_loss=0.01353, ecapa_loss=0.0001931, whisper_loss=0.06433, over 17874.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01113, ecapa_loss=0.0001735, whisper_loss=0.09135, over 3892975.15 frames. ], batch size: 74, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:59:24,380 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 15:59:24,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1713610.0, ans=0.0 2024-08-12 15:59:28,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1713610.0, ans=0.125 2024-08-12 16:00:04,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1713810.0, ans=0.0 2024-08-12 16:00:06,032 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 12 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 16:00:17,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1713910.0, ans=0.125 2024-08-12 16:00:27,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1713910.0, ans=0.125 2024-08-12 16:00:29,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1713910.0, ans=0.2 2024-08-12 16:00:31,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1713910.0, ans=0.125 2024-08-12 16:00:36,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1714010.0, ans=0.125 2024-08-12 16:00:50,427 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12000, loss[loss=0.1157, beats_loss=0.01005, ecapa_loss=0.000216, whisper_loss=0.1035, over 17562.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0111, ecapa_loss=0.0001747, whisper_loss=0.0914, over 3884891.05 frames. ], batch size: 73, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:00:50,428 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 16:01:32,460 INFO [train_multi_KD3.py:1149] (2/4) Epoch 12, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005955, whisper_loss=0.2482, over 922467.00 frames. 2024-08-12 16:01:52,014 INFO [train_multi_KD3.py:1149] (2/4) Epoch 12, validation on SV_voxceleb1: loss=0.004759, beats_loss=0, ecapa_loss=0.0004759, whisper_loss=0, over 939242.00 frames. 2024-08-12 16:03:43,604 INFO [train_multi_KD3.py:1149] (2/4) Epoch 12, validation on AT_audioset: loss=0.02413, beats_loss=0.02413, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 16:03:43,608 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 16:03:50,950 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 16:04:06,639 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.437e+01 2.734e+01 3.186e+01 7.564e+01, threshold=5.468e+01, percent-clipped=2.0 2024-08-12 16:04:13,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1714310.0, ans=0.0 2024-08-12 16:04:23,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1714310.0, ans=0.035 2024-08-12 16:04:36,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1714410.0, ans=0.95 2024-08-12 16:04:55,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1714510.0, ans=0.125 2024-08-12 16:04:59,160 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12050, loss[loss=0.1095, beats_loss=0.009282, ecapa_loss=0.0001966, whisper_loss=0.09824, over 18682.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01109, ecapa_loss=0.0001739, whisper_loss=0.0913, over 3889929.70 frames. ], batch size: 74, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:05:12,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1714710.0, ans=0.125 2024-08-12 16:05:13,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1714710.0, ans=0.0 2024-08-12 16:05:18,479 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 16:05:28,180 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 16:05:32,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1714810.0, ans=0.125 2024-08-12 16:05:47,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1714910.0, ans=0.2 2024-08-12 16:05:53,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1714910.0, ans=0.125 2024-08-12 16:06:15,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1715110.0, ans=0.125 2024-08-12 16:06:15,832 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12100, loss[loss=0.09985, beats_loss=0.00939, ecapa_loss=0.0002222, whisper_loss=0.08824, over 17924.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.011, ecapa_loss=0.0001737, whisper_loss=0.09189, over 3886170.43 frames. ], batch size: 77, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:06:20,748 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 16:06:29,840 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 16:06:38,374 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.371e+01 2.653e+01 2.949e+01 4.098e+01, threshold=5.305e+01, percent-clipped=0.0 2024-08-12 16:07:00,324 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 32 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 16:07:04,832 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=12.0 2024-08-12 16:07:25,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1715510.0, ans=0.0 2024-08-12 16:07:36,888 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12150, loss[loss=0.1131, beats_loss=0.01104, ecapa_loss=0.0001664, whisper_loss=0.1004, over 17631.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01107, ecapa_loss=0.0001735, whisper_loss=0.09117, over 3835139.66 frames. ], batch size: 69, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:07:42,882 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 23 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-12 16:07:50,827 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 28 from Vox, 13 fro AS 2024-08-12 16:08:05,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1715710.0, ans=0.125 2024-08-12 16:08:09,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1715810.0, ans=0.0 2024-08-12 16:08:31,298 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.98 vs. limit=15.0 2024-08-12 16:08:32,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1715910.0, ans=0.125 2024-08-12 16:08:45,153 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 16:08:51,808 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12200, loss[loss=0.1129, beats_loss=0.009846, ecapa_loss=0.0001934, whisper_loss=0.1011, over 15477.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01103, ecapa_loss=0.0001742, whisper_loss=0.0915, over 3833472.72 frames. ], batch size: 64, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:08:52,931 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.83 vs. limit=22.5 2024-08-12 16:09:13,939 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.462e+01 2.887e+01 3.237e+01 1.771e+02, threshold=5.773e+01, percent-clipped=2.0 2024-08-12 16:09:38,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1716410.0, ans=0.1 2024-08-12 16:09:47,490 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 16:09:54,579 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.244e+02 2024-08-12 16:10:07,308 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12250, loss[loss=0.08244, beats_loss=0.01209, ecapa_loss=0.000167, whisper_loss=0.06868, over 14313.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01111, ecapa_loss=0.0001733, whisper_loss=0.0909, over 3840998.61 frames. ], batch size: 57, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:10:09,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1716610.0, ans=0.125 2024-08-12 16:10:19,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1716610.0, ans=0.0 2024-08-12 16:10:28,562 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 16:10:30,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1716710.0, ans=0.125 2024-08-12 16:10:49,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1716810.0, ans=0.125 2024-08-12 16:10:59,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1716910.0, ans=0.1 2024-08-12 16:11:04,958 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 16:11:20,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1717010.0, ans=0.125 2024-08-12 16:11:27,651 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12300, loss[loss=0.1218, beats_loss=0.01177, ecapa_loss=0.0001599, whisper_loss=0.1084, over 23204.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01109, ecapa_loss=0.0001737, whisper_loss=0.09113, over 3855008.24 frames. ], batch size: 90, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:11:28,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1717110.0, ans=0.125 2024-08-12 16:11:39,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1717110.0, ans=0.0 2024-08-12 16:11:43,321 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.72 vs. limit=15.0 2024-08-12 16:11:52,106 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.615e+01 2.930e+01 3.275e+01 9.862e+01, threshold=5.860e+01, percent-clipped=1.0 2024-08-12 16:11:52,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1717210.0, ans=0.1 2024-08-12 16:12:01,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1717310.0, ans=0.125 2024-08-12 16:12:03,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1717310.0, ans=0.1 2024-08-12 16:12:20,818 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-12 16:12:35,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1717510.0, ans=0.125 2024-08-12 16:12:50,874 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.56 vs. limit=15.0 2024-08-12 16:12:51,307 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12350, loss[loss=0.1272, beats_loss=0.01062, ecapa_loss=0.0001603, whisper_loss=0.1149, over 23042.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01103, ecapa_loss=0.0001744, whisper_loss=0.09228, over 3906280.85 frames. ], batch size: 88, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:13:36,108 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 16:13:47,782 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 16:13:48,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1717910.0, ans=0.125 2024-08-12 16:13:54,613 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=15.0 2024-08-12 16:14:03,685 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2024-08-12 16:14:14,232 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12400, loss[loss=0.1096, beats_loss=0.01158, ecapa_loss=0.0001348, whisper_loss=0.09663, over 17013.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.011, ecapa_loss=0.0001739, whisper_loss=0.09176, over 3903921.47 frames. ], batch size: 65, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:14:16,026 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.50 vs. limit=12.0 2024-08-12 16:14:16,641 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 9 from Vox, 32 fro AS 2024-08-12 16:14:21,847 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 18 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 16:14:39,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1718210.0, ans=0.1 2024-08-12 16:14:40,096 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.684e+01 3.067e+01 3.396e+01 5.308e+01, threshold=6.133e+01, percent-clipped=1.0 2024-08-12 16:14:54,445 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-12 16:14:59,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1718310.0, ans=10.0 2024-08-12 16:15:01,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1718310.0, ans=0.2 2024-08-12 16:15:04,290 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 16:15:11,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1718410.0, ans=10.0 2024-08-12 16:15:27,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1718510.0, ans=0.07 2024-08-12 16:15:36,913 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12450, loss[loss=0.1019, beats_loss=0.009399, ecapa_loss=0.000152, whisper_loss=0.091, over 19611.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01103, ecapa_loss=0.0001738, whisper_loss=0.09118, over 3912770.30 frames. ], batch size: 73, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:15:37,418 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.879e+01 2024-08-12 16:15:39,125 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 16:15:51,411 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.55 vs. limit=10.0 2024-08-12 16:15:58,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1718710.0, ans=0.0 2024-08-12 16:16:01,267 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 16:16:28,380 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 16:16:56,325 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12500, loss[loss=0.1144, beats_loss=0.01, ecapa_loss=0.000148, whisper_loss=0.103, over 19914.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01099, ecapa_loss=0.0001728, whisper_loss=0.09199, over 3907673.24 frames. ], batch size: 76, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:16:56,498 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-12 16:16:58,284 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 16:17:03,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1719110.0, ans=0.2 2024-08-12 16:17:03,784 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.45 vs. limit=22.5 2024-08-12 16:17:09,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1719110.0, ans=0.1 2024-08-12 16:17:19,727 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.385e+01 2.736e+01 3.208e+01 9.127e+01, threshold=5.473e+01, percent-clipped=1.0 2024-08-12 16:17:30,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1719310.0, ans=0.04949747468305833 2024-08-12 16:17:39,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1719310.0, ans=0.0 2024-08-12 16:17:52,682 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2024-08-12 16:17:54,563 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.71 vs. limit=12.0 2024-08-12 16:17:56,353 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.74 vs. limit=15.0 2024-08-12 16:18:04,396 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-12 16:18:10,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1719510.0, ans=0.125 2024-08-12 16:18:11,435 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 32 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-12 16:18:16,531 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12550, loss[loss=0.1263, beats_loss=0.01007, ecapa_loss=0.0001735, whisper_loss=0.1145, over 17485.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01097, ecapa_loss=0.0001741, whisper_loss=0.09228, over 3913538.24 frames. ], batch size: 68, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:18:32,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1719710.0, ans=0.95 2024-08-12 16:18:41,812 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 16:18:47,902 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 16:18:48,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1719810.0, ans=0.04949747468305833 2024-08-12 16:18:49,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1719810.0, ans=0.07 2024-08-12 16:18:56,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1719810.0, ans=0.125 2024-08-12 16:19:24,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1720010.0, ans=0.2 2024-08-12 16:19:38,881 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12600, loss[loss=0.09115, beats_loss=0.01147, ecapa_loss=0.0001707, whisper_loss=0.07797, over 19153.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01102, ecapa_loss=0.0001746, whisper_loss=0.09198, over 3917372.50 frames. ], batch size: 77, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:20:03,942 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.583e+01 2.914e+01 3.404e+01 5.799e+01, threshold=5.828e+01, percent-clipped=1.0 2024-08-12 16:20:10,362 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.52 vs. limit=10.0 2024-08-12 16:20:26,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1720410.0, ans=0.0 2024-08-12 16:20:42,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1720510.0, ans=0.125 2024-08-12 16:20:57,620 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-12 16:20:58,948 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12650, loss[loss=0.09836, beats_loss=0.009112, ecapa_loss=0.0001931, whisper_loss=0.08732, over 14031.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.011, ecapa_loss=0.0001749, whisper_loss=0.09193, over 3878841.50 frames. ], batch size: 58, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:21:05,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1720610.0, ans=0.0 2024-08-12 16:21:05,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1720610.0, ans=0.1 2024-08-12 16:21:05,908 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.92 vs. limit=15.0 2024-08-12 16:21:22,446 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 18 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 16:21:28,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1720810.0, ans=0.1 2024-08-12 16:21:36,362 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 16:21:37,609 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 16:21:38,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1720810.0, ans=0.125 2024-08-12 16:21:40,241 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 16:21:41,698 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=22.5 2024-08-12 16:21:45,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1720910.0, ans=0.0 2024-08-12 16:22:00,179 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 16:22:00,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1721010.0, ans=0.1 2024-08-12 16:22:02,691 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 16:22:04,882 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-12 16:22:16,301 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12700, loss[loss=0.103, beats_loss=0.01053, ecapa_loss=0.0001979, whisper_loss=0.0905, over 19805.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01102, ecapa_loss=0.0001752, whisper_loss=0.09173, over 3886739.39 frames. ], batch size: 84, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:22:40,114 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.411e+01 2.657e+01 2.975e+01 5.020e+01, threshold=5.313e+01, percent-clipped=0.0 2024-08-12 16:22:41,664 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-12 16:22:46,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1721310.0, ans=0.0 2024-08-12 16:22:53,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1721310.0, ans=0.125 2024-08-12 16:22:58,300 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.39 vs. limit=22.5 2024-08-12 16:23:11,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1721410.0, ans=10.0 2024-08-12 16:23:15,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1721410.0, ans=0.1 2024-08-12 16:23:17,505 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.61 vs. limit=15.0 2024-08-12 16:23:19,637 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 16:23:23,006 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-12 16:23:35,776 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12750, loss[loss=0.1117, beats_loss=0.01121, ecapa_loss=0.0001683, whisper_loss=0.09884, over 16899.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01107, ecapa_loss=0.0001742, whisper_loss=0.09172, over 3902231.39 frames. ], batch size: 66, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:23:43,297 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 16:23:43,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1721610.0, ans=0.2 2024-08-12 16:23:49,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1721710.0, ans=0.125 2024-08-12 16:23:49,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1721710.0, ans=0.125 2024-08-12 16:23:59,659 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 24 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-12 16:24:01,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1721710.0, ans=0.125 2024-08-12 16:24:02,355 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 16:24:12,934 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=12.0 2024-08-12 16:24:15,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1721810.0, ans=0.2 2024-08-12 16:24:32,779 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2024-08-12 16:24:37,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1721910.0, ans=0.0 2024-08-12 16:24:48,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1722010.0, ans=0.0 2024-08-12 16:24:57,967 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12800, loss[loss=0.09872, beats_loss=0.01025, ecapa_loss=0.0001771, whisper_loss=0.0867, over 16126.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01108, ecapa_loss=0.000175, whisper_loss=0.092, over 3922634.32 frames. ], batch size: 61, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:25:14,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1722210.0, ans=0.0 2024-08-12 16:25:18,266 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 16:25:21,761 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.602e+01 2.886e+01 3.279e+01 7.661e+01, threshold=5.773e+01, percent-clipped=1.0 2024-08-12 16:25:23,602 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 16:25:39,428 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 30 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 16:25:43,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=1722310.0, ans=0.1 2024-08-12 16:25:55,638 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 16:25:55,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1722410.0, ans=0.0 2024-08-12 16:26:09,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1722510.0, ans=0.0 2024-08-12 16:26:17,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1722610.0, ans=0.2 2024-08-12 16:26:18,762 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12850, loss[loss=0.1013, beats_loss=0.01206, ecapa_loss=0.0001667, whisper_loss=0.08754, over 18132.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01112, ecapa_loss=0.0001741, whisper_loss=0.09206, over 3920312.59 frames. ], batch size: 74, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:26:24,237 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 16:26:49,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1722710.0, ans=0.0 2024-08-12 16:27:13,713 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.82 vs. limit=10.0 2024-08-12 16:27:23,162 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 16:27:40,669 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12900, loss[loss=0.1141, beats_loss=0.01073, ecapa_loss=0.000171, whisper_loss=0.1017, over 22989.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01114, ecapa_loss=0.0001747, whisper_loss=0.09136, over 3884552.39 frames. ], batch size: 91, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:27:46,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1723110.0, ans=0.05 2024-08-12 16:28:05,186 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.447e+01 2.675e+01 2.950e+01 4.604e+01, threshold=5.350e+01, percent-clipped=0.0 2024-08-12 16:28:09,098 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 16:28:09,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1723210.0, ans=0.125 2024-08-12 16:28:20,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1723310.0, ans=0.125 2024-08-12 16:28:42,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1723410.0, ans=15.0 2024-08-12 16:28:50,036 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 16:29:01,826 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 16:29:03,581 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 12950, loss[loss=0.1157, beats_loss=0.01105, ecapa_loss=0.0001966, whisper_loss=0.1027, over 18070.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01115, ecapa_loss=0.0001745, whisper_loss=0.09081, over 3876655.48 frames. ], batch size: 76, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:29:57,834 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 16:29:58,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1723910.0, ans=0.125 2024-08-12 16:30:05,699 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 16:30:25,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1724010.0, ans=0.125 2024-08-12 16:30:26,623 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.90 vs. limit=15.0 2024-08-12 16:30:29,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1724110.0, ans=0.025 2024-08-12 16:30:30,679 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13000, loss[loss=0.09525, beats_loss=0.01077, ecapa_loss=0.0001718, whisper_loss=0.08275, over 17688.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01105, ecapa_loss=0.0001747, whisper_loss=0.09178, over 3872258.34 frames. ], batch size: 73, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:30:30,860 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-12 16:30:44,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1724110.0, ans=0.125 2024-08-12 16:30:55,456 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.537e+01 2.771e+01 3.073e+01 6.149e+01, threshold=5.541e+01, percent-clipped=2.0 2024-08-12 16:30:55,583 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 16:30:59,650 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 16:30:59,960 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.17 vs. limit=10.0 2024-08-12 16:31:11,392 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-12 16:31:15,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1724310.0, ans=0.125 2024-08-12 16:31:19,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1724310.0, ans=6.0 2024-08-12 16:31:36,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1724510.0, ans=0.07 2024-08-12 16:31:54,589 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13050, loss[loss=0.1096, beats_loss=0.01257, ecapa_loss=0.0001805, whisper_loss=0.09525, over 22430.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01101, ecapa_loss=0.0001753, whisper_loss=0.09145, over 3840447.19 frames. ], batch size: 90, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:32:02,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1724610.0, ans=0.0 2024-08-12 16:32:09,832 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 16:32:10,582 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.29 vs. limit=22.5 2024-08-12 16:32:27,010 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 16:32:36,252 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 16:32:48,185 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 16:32:58,440 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2024-08-12 16:33:17,988 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13100, loss[loss=0.1197, beats_loss=0.01075, ecapa_loss=0.0001463, whisper_loss=0.1075, over 19350.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01091, ecapa_loss=0.0001752, whisper_loss=0.09246, over 3865023.36 frames. ], batch size: 75, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:33:21,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1725110.0, ans=0.125 2024-08-12 16:33:27,288 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 16:33:41,194 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.092e+01 2.633e+01 2.841e+01 3.164e+01 5.259e+01, threshold=5.682e+01, percent-clipped=0.0 2024-08-12 16:33:41,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1725210.0, ans=0.1 2024-08-12 16:33:49,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1725310.0, ans=0.125 2024-08-12 16:33:52,677 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 31 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-12 16:33:54,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1725310.0, ans=0.125 2024-08-12 16:33:56,967 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.80 vs. limit=10.0 2024-08-12 16:34:11,573 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 16:34:38,519 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13150, loss[loss=0.1125, beats_loss=0.01409, ecapa_loss=0.0001403, whisper_loss=0.09702, over 20774.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01097, ecapa_loss=0.0001743, whisper_loss=0.09259, over 3860359.49 frames. ], batch size: 81, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:34:48,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1725610.0, ans=0.2 2024-08-12 16:34:49,261 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 16:35:01,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1725710.0, ans=0.125 2024-08-12 16:35:07,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1725710.0, ans=0.125 2024-08-12 16:35:16,556 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-12 16:35:21,338 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 16:35:25,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1725810.0, ans=0.0 2024-08-12 16:35:31,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1725910.0, ans=0.0 2024-08-12 16:35:34,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1725910.0, ans=0.0 2024-08-12 16:35:37,047 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.67 vs. limit=10.0 2024-08-12 16:35:47,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1726010.0, ans=0.0 2024-08-12 16:35:56,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1726010.0, ans=0.125 2024-08-12 16:35:56,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1726010.0, ans=0.0 2024-08-12 16:36:02,210 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13200, loss[loss=0.1213, beats_loss=0.01041, ecapa_loss=0.0001776, whisper_loss=0.1091, over 22167.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01095, ecapa_loss=0.0001741, whisper_loss=0.09232, over 3849096.05 frames. ], batch size: 90, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:36:20,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1726210.0, ans=0.2 2024-08-12 16:36:23,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1726210.0, ans=0.0 2024-08-12 16:36:25,775 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.556e+01 2.815e+01 3.284e+01 6.256e+01, threshold=5.630e+01, percent-clipped=1.0 2024-08-12 16:36:26,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1726210.0, ans=0.125 2024-08-12 16:36:34,247 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=12.0 2024-08-12 16:36:35,539 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.55 vs. limit=22.5 2024-08-12 16:36:38,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1726310.0, ans=0.125 2024-08-12 16:36:42,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1726310.0, ans=0.125 2024-08-12 16:37:03,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1726410.0, ans=0.015 2024-08-12 16:37:06,993 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 16:37:07,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1726510.0, ans=0.0 2024-08-12 16:37:10,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1726510.0, ans=0.1 2024-08-12 16:37:14,690 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 16:37:21,366 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 16:37:22,740 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-12 16:37:24,772 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13250, loss[loss=0.1024, beats_loss=0.009997, ecapa_loss=0.0002181, whisper_loss=0.09027, over 20782.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01087, ecapa_loss=0.0001747, whisper_loss=0.09279, over 3827983.21 frames. ], batch size: 90, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:37:26,910 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-12 16:37:27,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1726610.0, ans=0.125 2024-08-12 16:37:27,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1726610.0, ans=0.1 2024-08-12 16:37:29,228 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.83 vs. limit=22.5 2024-08-12 16:37:31,754 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 16:37:37,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1726610.0, ans=0.0 2024-08-12 16:37:39,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1726610.0, ans=0.125 2024-08-12 16:37:42,487 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 16:37:45,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2024-08-12 16:37:48,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1726710.0, ans=0.125 2024-08-12 16:37:50,918 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-12 16:37:54,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1726710.0, ans=0.1 2024-08-12 16:38:01,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1726810.0, ans=0.125 2024-08-12 16:38:05,375 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.90 vs. limit=10.0 2024-08-12 16:38:15,246 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.63 vs. limit=15.0 2024-08-12 16:38:20,840 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-12 16:38:35,968 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 16:38:49,467 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13300, loss[loss=0.09882, beats_loss=0.01171, ecapa_loss=0.0001713, whisper_loss=0.0854, over 21210.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01082, ecapa_loss=0.0001756, whisper_loss=0.09259, over 3809192.34 frames. ], batch size: 85, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:38:54,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1727110.0, ans=0.0 2024-08-12 16:38:58,462 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-12 16:39:12,808 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.550e+01 2.829e+01 3.095e+01 6.127e+01, threshold=5.657e+01, percent-clipped=1.0 2024-08-12 16:39:22,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1727310.0, ans=0.125 2024-08-12 16:39:36,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1727410.0, ans=0.1 2024-08-12 16:39:36,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1727410.0, ans=0.125 2024-08-12 16:39:39,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1727410.0, ans=0.2 2024-08-12 16:39:43,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1727410.0, ans=0.0 2024-08-12 16:39:44,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1727410.0, ans=0.125 2024-08-12 16:40:07,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1727510.0, ans=0.1 2024-08-12 16:40:09,878 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13350, loss[loss=0.1104, beats_loss=0.00873, ecapa_loss=0.0002002, whisper_loss=0.09968, over 20662.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01079, ecapa_loss=0.0001737, whisper_loss=0.09324, over 3813924.45 frames. ], batch size: 82, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:40:46,299 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.504e+05 2024-08-12 16:41:04,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1727910.0, ans=0.125 2024-08-12 16:41:14,410 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 16:41:20,841 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 16:41:26,790 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 16:41:31,275 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13400, loss[loss=0.09892, beats_loss=0.01046, ecapa_loss=0.0001638, whisper_loss=0.08683, over 17687.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01081, ecapa_loss=0.0001742, whisper_loss=0.09285, over 3820480.88 frames. ], batch size: 69, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:41:31,383 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 16:41:54,200 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.756e+01 3.172e+01 3.565e+01 5.325e+01, threshold=6.343e+01, percent-clipped=0.0 2024-08-12 16:41:54,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1728210.0, ans=0.0 2024-08-12 16:41:59,240 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2024-08-12 16:42:04,529 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-12 16:42:07,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1728310.0, ans=0.0 2024-08-12 16:42:20,346 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 16:42:30,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1728410.0, ans=0.125 2024-08-12 16:42:31,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1728410.0, ans=0.0 2024-08-12 16:42:50,934 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13450, loss[loss=0.1072, beats_loss=0.011, ecapa_loss=0.0001657, whisper_loss=0.09455, over 16943.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01089, ecapa_loss=0.0001745, whisper_loss=0.09207, over 3815932.80 frames. ], batch size: 66, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:43:02,524 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-12 16:43:05,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1728710.0, ans=0.0 2024-08-12 16:43:37,068 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 16:43:47,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1728910.0, ans=0.125 2024-08-12 16:44:03,874 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 27 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-12 16:44:05,189 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 41 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 16:44:15,029 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13500, loss[loss=0.09249, beats_loss=0.01209, ecapa_loss=0.0001961, whisper_loss=0.07844, over 16579.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01092, ecapa_loss=0.0001745, whisper_loss=0.09191, over 3816931.64 frames. ], batch size: 69, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:44:28,194 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 16:44:36,326 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-12 16:44:38,852 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.512e+01 2.797e+01 3.062e+01 5.746e+01, threshold=5.594e+01, percent-clipped=0.0 2024-08-12 16:44:45,080 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 22 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 16:44:47,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1729310.0, ans=0.1 2024-08-12 16:44:48,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1729310.0, ans=0.0 2024-08-12 16:44:51,444 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 16:45:01,498 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 32 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-12 16:45:07,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1729410.0, ans=0.2 2024-08-12 16:45:12,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1729410.0, ans=0.125 2024-08-12 16:45:34,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1729510.0, ans=0.05 2024-08-12 16:45:38,604 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13550, loss[loss=0.1002, beats_loss=0.01079, ecapa_loss=0.0001738, whisper_loss=0.08765, over 18958.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01087, ecapa_loss=0.0001745, whisper_loss=0.09237, over 3840045.73 frames. ], batch size: 74, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:45:41,913 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.01 vs. limit=22.5 2024-08-12 16:45:49,235 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-12 16:45:53,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1729610.0, ans=0.0 2024-08-12 16:46:00,200 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 16:47:05,957 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13600, loss[loss=0.1098, beats_loss=0.01127, ecapa_loss=0.0001572, whisper_loss=0.09696, over 19408.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01094, ecapa_loss=0.0001742, whisper_loss=0.09262, over 3845299.21 frames. ], batch size: 72, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:47:31,033 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.482e+01 2.733e+01 3.104e+01 2.478e+02, threshold=5.467e+01, percent-clipped=1.0 2024-08-12 16:47:37,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1730210.0, ans=0.1 2024-08-12 16:47:45,140 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 16:47:46,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1730310.0, ans=0.0 2024-08-12 16:48:03,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1730410.0, ans=0.1 2024-08-12 16:48:10,388 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 20 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 16:48:26,975 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.53 vs. limit=15.0 2024-08-12 16:48:27,349 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.84 vs. limit=15.0 2024-08-12 16:48:31,126 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13650, loss[loss=0.1277, beats_loss=0.01051, ecapa_loss=0.00019, whisper_loss=0.1153, over 14080.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01097, ecapa_loss=0.0001728, whisper_loss=0.09284, over 3855031.90 frames. ], batch size: 56, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:48:33,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1730610.0, ans=0.125 2024-08-12 16:48:38,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1730610.0, ans=0.1 2024-08-12 16:48:39,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1730610.0, ans=0.0 2024-08-12 16:48:53,459 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2024-08-12 16:49:05,357 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 16:49:15,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.16 vs. limit=6.0 2024-08-12 16:49:46,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1731010.0, ans=0.015 2024-08-12 16:49:53,465 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 15 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 16:50:06,988 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13700, loss[loss=0.08047, beats_loss=0.01208, ecapa_loss=0.0001498, whisper_loss=0.06689, over 20906.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01102, ecapa_loss=0.0001742, whisper_loss=0.0915, over 3853675.32 frames. ], batch size: 83, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:50:08,325 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 16:50:15,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1731110.0, ans=0.1 2024-08-12 16:50:34,624 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.985e+01 2.487e+01 2.754e+01 3.214e+01 5.264e+01, threshold=5.508e+01, percent-clipped=0.0 2024-08-12 16:50:38,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1731210.0, ans=0.0 2024-08-12 16:51:18,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1731510.0, ans=0.1 2024-08-12 16:51:22,905 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 16:51:28,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1731510.0, ans=0.2 2024-08-12 16:51:33,599 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13750, loss[loss=0.1087, beats_loss=0.00886, ecapa_loss=0.0002476, whisper_loss=0.09735, over 17867.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01108, ecapa_loss=0.0001738, whisper_loss=0.09126, over 3870949.03 frames. ], batch size: 76, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:51:44,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1731610.0, ans=0.05 2024-08-12 16:51:48,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1731710.0, ans=0.1 2024-08-12 16:51:51,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1731710.0, ans=0.1 2024-08-12 16:52:34,020 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 16:52:38,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1731910.0, ans=0.0 2024-08-12 16:52:40,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1732010.0, ans=0.125 2024-08-12 16:52:59,349 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13800, loss[loss=0.08425, beats_loss=0.01568, ecapa_loss=0.0001382, whisper_loss=0.06719, over 21864.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01103, ecapa_loss=0.0001744, whisper_loss=0.09167, over 3881499.61 frames. ], batch size: 90, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:53:00,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1732110.0, ans=0.0 2024-08-12 16:53:02,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1732110.0, ans=0.1 2024-08-12 16:53:18,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1732210.0, ans=0.125 2024-08-12 16:53:24,884 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.542e+01 2.940e+01 3.312e+01 1.437e+02, threshold=5.879e+01, percent-clipped=2.0 2024-08-12 16:53:31,530 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=15.0 2024-08-12 16:53:33,697 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 36 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 16:53:40,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1732310.0, ans=0.0 2024-08-12 16:53:42,849 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 16:54:28,090 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13850, loss[loss=0.0947, beats_loss=0.01315, ecapa_loss=0.0001408, whisper_loss=0.08014, over 22503.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01091, ecapa_loss=0.0001743, whisper_loss=0.0929, over 3910151.44 frames. ], batch size: 90, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:55:45,624 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 16:55:58,090 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 24 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-12 16:55:59,217 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13900, loss[loss=0.12, beats_loss=0.008959, ecapa_loss=0.0001637, whisper_loss=0.1094, over 15348.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01097, ecapa_loss=0.0001734, whisper_loss=0.0922, over 3886875.67 frames. ], batch size: 58, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:56:05,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1733110.0, ans=0.125 2024-08-12 16:56:25,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1733210.0, ans=0.1 2024-08-12 16:56:25,961 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.635e+01 2.870e+01 3.246e+01 6.120e+01, threshold=5.740e+01, percent-clipped=1.0 2024-08-12 16:56:32,936 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 16 from LS+wenet, 34 from Vox, 30 fro AS 2024-08-12 16:56:36,100 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.85 vs. limit=22.5 2024-08-12 16:56:59,866 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-12 16:57:21,623 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 13950, loss[loss=0.122, beats_loss=0.007812, ecapa_loss=0.0002205, whisper_loss=0.112, over 18824.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01088, ecapa_loss=0.0001745, whisper_loss=0.09265, over 3908342.19 frames. ], batch size: 78, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:57:27,374 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 34 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 16:57:39,622 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=12.0 2024-08-12 16:57:41,754 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 16:58:01,932 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=15.0 2024-08-12 16:58:06,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1733810.0, ans=0.0 2024-08-12 16:58:37,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1734010.0, ans=0.1 2024-08-12 16:58:44,948 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 14000, loss[loss=0.1057, beats_loss=0.007775, ecapa_loss=0.0001927, whisper_loss=0.09599, over 18744.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01088, ecapa_loss=0.0001746, whisper_loss=0.09242, over 3906834.74 frames. ], batch size: 75, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:59:09,564 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.481e+01 2.768e+01 3.199e+01 7.750e+01, threshold=5.536e+01, percent-clipped=1.0 2024-08-12 16:59:28,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1734310.0, ans=0.125 2024-08-12 16:59:42,359 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2024-08-12 17:00:06,099 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2024-08-12 17:00:07,685 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 17:00:14,583 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 14050, loss[loss=0.1057, beats_loss=0.009132, ecapa_loss=0.0002108, whisper_loss=0.09445, over 18550.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01086, ecapa_loss=0.000174, whisper_loss=0.09301, over 3901458.42 frames. ], batch size: 77, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:00:32,138 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 17:00:36,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1734710.0, ans=0.09899494936611666 2024-08-12 17:00:39,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1734710.0, ans=0.125 2024-08-12 17:00:46,437 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 8 from Vox, 27 fro AS 2024-08-12 17:00:57,739 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 17:01:02,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1734810.0, ans=0.0 2024-08-12 17:01:35,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1735010.0, ans=0.125 2024-08-12 17:01:37,321 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.97 vs. limit=15.0 2024-08-12 17:01:38,245 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 17:01:39,372 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.92 vs. limit=12.0 2024-08-12 17:01:41,455 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 14100, loss[loss=0.1093, beats_loss=0.009522, ecapa_loss=0.0002126, whisper_loss=0.09761, over 20909.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01082, ecapa_loss=0.0001742, whisper_loss=0.09333, over 3897170.81 frames. ], batch size: 88, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:01:42,851 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.56 vs. limit=15.0 2024-08-12 17:01:53,909 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2024-08-12 17:02:09,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1735210.0, ans=0.2 2024-08-12 17:02:09,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1735210.0, ans=0.0 2024-08-12 17:02:10,218 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.510e+01 2.862e+01 3.257e+01 4.688e+01, threshold=5.723e+01, percent-clipped=0.0 2024-08-12 17:02:15,377 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-12 17:02:23,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1735310.0, ans=10.0 2024-08-12 17:03:10,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 14150, loss[loss=0.1011, beats_loss=0.0101, ecapa_loss=0.0002068, whisper_loss=0.0889, over 19795.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01078, ecapa_loss=0.0001741, whisper_loss=0.09353, over 3884247.83 frames. ], batch size: 80, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:03:49,887 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2024-08-12 17:04:06,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1735810.0, ans=0.125 2024-08-12 17:04:13,942 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 17:04:24,223 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 11 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 17:04:36,373 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-08-12 17:04:50,226 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 14200, loss[loss=0.1174, beats_loss=0.008305, ecapa_loss=0.0001871, whisper_loss=0.1072, over 19109.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01077, ecapa_loss=0.0001741, whisper_loss=0.09379, over 3899358.00 frames. ], batch size: 75, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:04:54,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1736110.0, ans=0.0 2024-08-12 17:05:02,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1736110.0, ans=0.0 2024-08-12 17:05:10,512 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.19 vs. limit=22.5 2024-08-12 17:05:11,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1736210.0, ans=0.125 2024-08-12 17:05:14,309 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.568e+01 2.822e+01 3.210e+01 8.568e+01, threshold=5.645e+01, percent-clipped=1.0 2024-08-12 17:05:23,163 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2024-08-12 17:05:30,243 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 17:05:43,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1736410.0, ans=0.125 2024-08-12 17:05:51,877 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 17:05:53,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1736510.0, ans=0.1 2024-08-12 17:06:06,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1736510.0, ans=0.125 2024-08-12 17:06:10,906 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 14250, loss[loss=0.1117, beats_loss=0.009997, ecapa_loss=0.0001695, whisper_loss=0.1, over 20327.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01077, ecapa_loss=0.0001737, whisper_loss=0.09354, over 3894529.06 frames. ], batch size: 77, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:06:30,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1736710.0, ans=0.0 2024-08-12 17:06:33,271 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 17:06:45,477 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 17:06:50,692 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.12 vs. limit=15.0 2024-08-12 17:07:01,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1736810.0, ans=0.035 2024-08-12 17:07:28,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1737010.0, ans=0.2 2024-08-12 17:07:32,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1737010.0, ans=0.125 2024-08-12 17:07:36,141 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.69 vs. limit=15.0 2024-08-12 17:07:37,096 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 17:07:41,379 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-12 17:07:42,739 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 17:07:42,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1737110.0, ans=0.1 2024-08-12 17:07:44,006 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 14300, loss[loss=0.0955, beats_loss=0.01109, ecapa_loss=0.0001557, whisper_loss=0.08285, over 17078.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01078, ecapa_loss=0.0001735, whisper_loss=0.0934, over 3912497.07 frames. ], batch size: 67, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:07:48,491 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-12 17:08:10,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1737210.0, ans=0.125 2024-08-12 17:08:10,784 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.619e+01 2.822e+01 3.259e+01 8.695e+01, threshold=5.643e+01, percent-clipped=1.0 2024-08-12 17:08:27,211 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=12.0 2024-08-12 17:08:29,545 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.19 vs. limit=10.0 2024-08-12 17:08:50,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1737410.0, ans=0.125 2024-08-12 17:08:54,880 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-08-12 17:09:03,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1737510.0, ans=0.0 2024-08-12 17:09:11,220 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 14350, loss[loss=0.08957, beats_loss=0.01133, ecapa_loss=0.0001517, whisper_loss=0.07672, over 19727.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01094, ecapa_loss=0.0001721, whisper_loss=0.09158, over 3903660.65 frames. ], batch size: 79, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:09:36,389 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 27 from LS+wenet, 20 from Vox, 11 fro AS 2024-08-12 17:10:51,199 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-08-12 17:11:07,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1738010.0, ans=0.04949747468305833 2024-08-12 17:11:13,066 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 14400, loss[loss=0.1151, beats_loss=0.009463, ecapa_loss=0.0002191, whisper_loss=0.1035, over 21480.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01091, ecapa_loss=0.0001731, whisper_loss=0.09196, over 3920120.86 frames. ], batch size: 90, lr: 5.14e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:11:18,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1738110.0, ans=0.0 2024-08-12 17:11:28,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1738110.0, ans=0.125 2024-08-12 17:11:44,264 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2024-08-12 17:11:44,594 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.468e+01 2.751e+01 3.183e+01 4.709e+01, threshold=5.502e+01, percent-clipped=0.0 2024-08-12 17:11:47,803 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 17:12:11,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1738310.0, ans=0.125 2024-08-12 17:12:17,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1738410.0, ans=0.0 2024-08-12 17:12:19,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1738410.0, ans=0.125 2024-08-12 17:12:24,685 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2024-08-12 17:12:34,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1738410.0, ans=0.0 2024-08-12 17:12:49,988 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 17:12:52,969 INFO [train_multi_KD3.py:1116] (2/4) Epoch 12, batch 14450, loss[loss=0.1062, beats_loss=0.01098, ecapa_loss=0.0001659, whisper_loss=0.09357, over 19668.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01095, ecapa_loss=0.0001732, whisper_loss=0.09196, over 3930629.91 frames. ], batch size: 75, lr: 5.14e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:13:46,995 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 17:13:56,279 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2024-08-12 17:14:00,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1739010.0, ans=0.0 2024-08-12 17:14:02,982 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 17:14:04,632 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-12 17:15:01,338 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 0, loss[loss=0.09059, beats_loss=0.01291, ecapa_loss=0.0001806, whisper_loss=0.07587, over 18525.00 frames. ], tot_loss[loss=0.09059, beats_loss=0.01291, ecapa_loss=0.0001806, whisper_loss=0.07587, over 18525.00 frames. ], batch size: 76, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:15:01,338 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 17:15:45,032 INFO [train_multi_KD3.py:1149] (2/4) Epoch 13, validation on ASR_libri: loss=0.255, beats_loss=0, ecapa_loss=0.0005844, whisper_loss=0.2492, over 922467.00 frames. 2024-08-12 17:16:01,530 INFO [train_multi_KD3.py:1149] (2/4) Epoch 13, validation on SV_voxceleb1: loss=0.004777, beats_loss=0, ecapa_loss=0.0004777, whisper_loss=0, over 939242.00 frames. 2024-08-12 17:17:35,316 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9992, 3.3330, 2.2550, 3.8115], device='cuda:2') 2024-08-12 17:18:04,536 INFO [train_multi_KD3.py:1149] (2/4) Epoch 13, validation on AT_audioset: loss=0.02416, beats_loss=0.02416, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 17:18:04,544 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 17:18:42,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1739180.0, ans=0.125 2024-08-12 17:18:55,536 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.525e+01 2.835e+01 3.382e+01 8.605e+01, threshold=5.671e+01, percent-clipped=1.0 2024-08-12 17:19:46,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1739380.0, ans=0.1 2024-08-12 17:20:18,944 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 50, loss[loss=0.09944, beats_loss=0.0109, ecapa_loss=0.0001801, whisper_loss=0.08673, over 23099.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01017, ecapa_loss=0.0001764, whisper_loss=0.09168, over 887381.93 frames. ], batch size: 93, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:21:08,880 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-12 17:21:12,029 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 6 from Vox, 28 fro AS 2024-08-12 17:21:26,517 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 17:21:36,307 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-12 17:21:40,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1739880.0, ans=0.125 2024-08-12 17:22:00,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1739980.0, ans=0.0 2024-08-12 17:22:06,719 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-12 17:22:20,257 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 100, loss[loss=0.103, beats_loss=0.007022, ecapa_loss=0.0002399, whisper_loss=0.09363, over 15854.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01023, ecapa_loss=0.0001737, whisper_loss=0.08913, over 1520824.43 frames. ], batch size: 65, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:22:37,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1740080.0, ans=0.125 2024-08-12 17:22:39,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1740080.0, ans=0.1 2024-08-12 17:23:05,582 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+01 2.865e+01 3.060e+01 3.356e+01 6.213e+01, threshold=6.120e+01, percent-clipped=1.0 2024-08-12 17:23:06,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1740180.0, ans=0.2 2024-08-12 17:23:13,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1740280.0, ans=0.125 2024-08-12 17:23:20,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1740280.0, ans=0.2 2024-08-12 17:23:22,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1740280.0, ans=0.125 2024-08-12 17:24:00,957 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2024-08-12 17:24:09,514 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.28 vs. limit=12.0 2024-08-12 17:24:13,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1740480.0, ans=0.0 2024-08-12 17:24:15,848 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 150, loss[loss=0.093, beats_loss=0.01136, ecapa_loss=0.0002023, whisper_loss=0.07962, over 16553.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01019, ecapa_loss=0.0001735, whisper_loss=0.09028, over 2025777.79 frames. ], batch size: 66, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:24:29,203 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 17:24:50,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1740680.0, ans=0.0 2024-08-12 17:24:59,668 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 17:25:00,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1740780.0, ans=0.125 2024-08-12 17:25:15,446 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=12.0 2024-08-12 17:25:42,870 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 200, loss[loss=0.1025, beats_loss=0.01245, ecapa_loss=0.0001481, whisper_loss=0.08852, over 23665.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01035, ecapa_loss=0.000174, whisper_loss=0.09127, over 2436276.79 frames. ], batch size: 93, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:26:00,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1741180.0, ans=0.125 2024-08-12 17:26:02,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1741180.0, ans=0.0 2024-08-12 17:26:11,347 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.594e+01 3.008e+01 3.381e+01 4.307e+01, threshold=6.015e+01, percent-clipped=0.0 2024-08-12 17:26:11,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1741180.0, ans=0.1 2024-08-12 17:26:13,134 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 16 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-12 17:26:13,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2024-08-12 17:26:40,367 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.28 vs. limit=12.0 2024-08-12 17:26:40,397 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.42 vs. limit=12.0 2024-08-12 17:26:52,226 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=15.0 2024-08-12 17:27:00,255 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 250, loss[loss=0.1072, beats_loss=0.01001, ecapa_loss=0.0001572, whisper_loss=0.09562, over 19460.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01042, ecapa_loss=0.0001736, whisper_loss=0.09165, over 2738968.40 frames. ], batch size: 70, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:27:00,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1741580.0, ans=0.0 2024-08-12 17:27:02,334 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-12 17:27:15,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1741680.0, ans=0.125 2024-08-12 17:27:18,527 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 17:27:25,781 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 28 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 17:27:30,944 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 33 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 17:27:37,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1741780.0, ans=0.125 2024-08-12 17:27:42,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1741780.0, ans=0.0 2024-08-12 17:27:45,005 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-12 17:27:46,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1741880.0, ans=0.2 2024-08-12 17:27:49,952 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2024-08-12 17:27:55,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1741880.0, ans=0.0 2024-08-12 17:28:00,512 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 17:28:04,850 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.928e+02 2024-08-12 17:28:07,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1741980.0, ans=0.0 2024-08-12 17:28:15,089 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 17:28:17,814 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 300, loss[loss=0.1244, beats_loss=0.01004, ecapa_loss=0.0001589, whisper_loss=0.1127, over 23453.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01055, ecapa_loss=0.0001731, whisper_loss=0.09227, over 3020867.58 frames. ], batch size: 90, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:28:20,002 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-08-12 17:28:31,477 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 19 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 17:28:32,740 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-12 17:28:34,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1742180.0, ans=0.0 2024-08-12 17:28:44,480 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.349e+01 2.732e+01 3.113e+01 6.634e+01, threshold=5.463e+01, percent-clipped=1.0 2024-08-12 17:28:45,336 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-12 17:29:32,752 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 350, loss[loss=0.07837, beats_loss=0.01139, ecapa_loss=0.0001568, whisper_loss=0.06542, over 19779.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01063, ecapa_loss=0.0001725, whisper_loss=0.09119, over 3180728.09 frames. ], batch size: 77, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:29:38,409 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 17:29:51,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1742680.0, ans=0.125 2024-08-12 17:29:57,762 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 30 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-12 17:30:09,347 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 17:30:30,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1742980.0, ans=0.0 2024-08-12 17:30:44,510 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 400, loss[loss=0.1108, beats_loss=0.01257, ecapa_loss=0.0001617, whisper_loss=0.09664, over 20357.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01071, ecapa_loss=0.000172, whisper_loss=0.09126, over 3298147.60 frames. ], batch size: 78, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:30:47,636 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 17:30:49,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1743080.0, ans=0.0 2024-08-12 17:31:01,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1743180.0, ans=0.125 2024-08-12 17:31:10,793 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 2.525e+01 2.765e+01 3.244e+01 1.385e+02, threshold=5.529e+01, percent-clipped=2.0 2024-08-12 17:31:23,826 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 17:31:31,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1743380.0, ans=0.125 2024-08-12 17:31:51,849 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 17:31:55,657 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-12 17:31:58,343 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 450, loss[loss=0.07601, beats_loss=0.01594, ecapa_loss=0.0001634, whisper_loss=0.05844, over 21990.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01077, ecapa_loss=0.0001716, whisper_loss=0.09051, over 3385124.73 frames. ], batch size: 93, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:32:21,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1743680.0, ans=0.125 2024-08-12 17:32:31,804 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 24 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-12 17:32:59,733 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 17:33:11,486 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 500, loss[loss=0.1004, beats_loss=0.01208, ecapa_loss=0.0001418, whisper_loss=0.08688, over 17458.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01078, ecapa_loss=0.000171, whisper_loss=0.09106, over 3467857.45 frames. ], batch size: 67, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:33:12,326 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2024-08-12 17:33:27,152 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 17:33:30,993 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=15.0 2024-08-12 17:33:40,522 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.531e+01 2.780e+01 3.170e+01 4.119e+01, threshold=5.561e+01, percent-clipped=0.0 2024-08-12 17:33:48,608 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.48 vs. limit=15.0 2024-08-12 17:33:57,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1744280.0, ans=0.125 2024-08-12 17:34:00,578 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 17:34:24,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1744480.0, ans=0.125 2024-08-12 17:34:28,007 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 17:34:30,865 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 550, loss[loss=0.09055, beats_loss=0.01051, ecapa_loss=0.0002022, whisper_loss=0.07802, over 16971.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01067, ecapa_loss=0.0001708, whisper_loss=0.0917, over 3568968.61 frames. ], batch size: 71, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:34:37,294 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-12 17:34:40,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1744580.0, ans=0.125 2024-08-12 17:35:08,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1744780.0, ans=0.125 2024-08-12 17:35:30,051 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.29 vs. limit=15.0 2024-08-12 17:35:44,904 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 17:35:45,751 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 600, loss[loss=0.1226, beats_loss=0.009119, ecapa_loss=0.0001949, whisper_loss=0.1115, over 15647.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01073, ecapa_loss=0.0001696, whisper_loss=0.09203, over 3681709.90 frames. ], batch size: 62, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:35:53,919 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 17:35:54,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1745080.0, ans=0.0 2024-08-12 17:36:04,548 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 17:36:11,576 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.517e+01 2.834e+01 3.150e+01 6.498e+01, threshold=5.667e+01, percent-clipped=2.0 2024-08-12 17:36:26,695 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 17:36:38,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1745380.0, ans=0.125 2024-08-12 17:36:39,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1745380.0, ans=0.0 2024-08-12 17:36:57,707 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 650, loss[loss=0.1074, beats_loss=0.01226, ecapa_loss=0.0001761, whisper_loss=0.09338, over 16688.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01078, ecapa_loss=0.0001701, whisper_loss=0.09174, over 3702204.40 frames. ], batch size: 69, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:37:00,294 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 31 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 17:37:03,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1745580.0, ans=0.125 2024-08-12 17:37:20,320 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 17:37:34,148 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-12 17:37:35,717 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 28 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-12 17:37:36,445 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2024-08-12 17:37:44,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1745880.0, ans=0.5 2024-08-12 17:37:46,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1745880.0, ans=0.125 2024-08-12 17:37:57,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1745980.0, ans=0.1 2024-08-12 17:38:10,843 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 700, loss[loss=0.1079, beats_loss=0.008913, ecapa_loss=0.0001986, whisper_loss=0.09703, over 22169.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01074, ecapa_loss=0.0001702, whisper_loss=0.09196, over 3745886.83 frames. ], batch size: 90, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:38:17,266 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-12 17:38:19,853 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.45 vs. limit=15.0 2024-08-12 17:38:22,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1746080.0, ans=0.0 2024-08-12 17:38:24,493 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=12.0 2024-08-12 17:38:37,561 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.446e+01 2.651e+01 3.040e+01 5.006e+01, threshold=5.302e+01, percent-clipped=0.0 2024-08-12 17:38:38,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1746180.0, ans=0.125 2024-08-12 17:38:49,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1746280.0, ans=0.1 2024-08-12 17:39:15,658 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 17:39:18,415 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2024-08-12 17:39:20,871 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-12 17:39:24,625 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 750, loss[loss=0.09165, beats_loss=0.01144, ecapa_loss=0.0001793, whisper_loss=0.07842, over 21623.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01079, ecapa_loss=0.0001699, whisper_loss=0.09188, over 3774887.96 frames. ], batch size: 86, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:39:59,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1746780.0, ans=0.125 2024-08-12 17:40:02,191 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.54 vs. limit=15.0 2024-08-12 17:40:04,106 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 17:40:09,803 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 17:40:11,300 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-12 17:40:21,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1746980.0, ans=0.0 2024-08-12 17:40:36,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1747080.0, ans=0.125 2024-08-12 17:40:37,182 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 800, loss[loss=0.1138, beats_loss=0.009925, ecapa_loss=0.0001749, whisper_loss=0.1021, over 13639.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01077, ecapa_loss=0.0001706, whisper_loss=0.09103, over 3767075.17 frames. ], batch size: 54, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:40:38,840 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 17:40:59,442 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2024-08-12 17:41:00,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1747180.0, ans=0.1 2024-08-12 17:41:01,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1747180.0, ans=0.125 2024-08-12 17:41:03,613 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.398e+01 2.726e+01 3.050e+01 4.286e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-12 17:41:08,255 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 17:41:23,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1747380.0, ans=0.07 2024-08-12 17:41:26,789 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-12 17:41:27,675 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-12 17:41:40,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1747480.0, ans=0.025 2024-08-12 17:41:43,887 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-12 17:41:51,279 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 850, loss[loss=0.1151, beats_loss=0.008579, ecapa_loss=0.0001818, whisper_loss=0.1047, over 23749.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01079, ecapa_loss=0.0001691, whisper_loss=0.09067, over 3765708.58 frames. ], batch size: 91, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:41:54,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1747580.0, ans=0.025 2024-08-12 17:41:55,969 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 32 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-12 17:42:28,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1747780.0, ans=0.0 2024-08-12 17:42:40,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1747880.0, ans=0.125 2024-08-12 17:42:44,932 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.90 vs. limit=22.5 2024-08-12 17:42:55,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1747980.0, ans=0.1 2024-08-12 17:42:55,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1747980.0, ans=0.125 2024-08-12 17:42:56,262 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-12 17:43:00,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1747980.0, ans=0.125 2024-08-12 17:43:06,044 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 900, loss[loss=0.1097, beats_loss=0.009629, ecapa_loss=0.0001847, whisper_loss=0.09825, over 16970.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01076, ecapa_loss=0.0001687, whisper_loss=0.09126, over 3760753.47 frames. ], batch size: 67, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:43:17,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1748080.0, ans=0.1 2024-08-12 17:43:18,229 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 17:43:26,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1748180.0, ans=0.2 2024-08-12 17:43:32,668 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.406e+01 2.653e+01 2.914e+01 6.572e+01, threshold=5.306e+01, percent-clipped=1.0 2024-08-12 17:44:01,983 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2024-08-12 17:44:09,701 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-12 17:44:11,266 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 17:44:16,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1748580.0, ans=0.125 2024-08-12 17:44:17,617 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 950, loss[loss=0.109, beats_loss=0.01085, ecapa_loss=0.0001922, whisper_loss=0.09625, over 21773.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01083, ecapa_loss=0.000168, whisper_loss=0.09039, over 3761836.47 frames. ], batch size: 92, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:44:28,187 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 17:44:33,057 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.52 vs. limit=15.0 2024-08-12 17:44:42,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1748680.0, ans=0.09899494936611666 2024-08-12 17:44:46,169 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 17:44:46,637 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.92 vs. limit=10.0 2024-08-12 17:44:50,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1748780.0, ans=0.125 2024-08-12 17:45:04,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1748880.0, ans=0.0 2024-08-12 17:45:11,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1748880.0, ans=0.0 2024-08-12 17:45:14,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1748980.0, ans=0.125 2024-08-12 17:45:15,822 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 30 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-12 17:45:22,680 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 17:45:24,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1748980.0, ans=0.125 2024-08-12 17:45:27,790 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1000, loss[loss=0.1111, beats_loss=0.01089, ecapa_loss=0.0001335, whisper_loss=0.0989, over 19009.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01084, ecapa_loss=0.0001673, whisper_loss=0.09055, over 3812366.38 frames. ], batch size: 74, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:45:30,933 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 17:45:48,187 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-12 17:45:53,647 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.479e+01 2.731e+01 3.171e+01 4.511e+01, threshold=5.462e+01, percent-clipped=0.0 2024-08-12 17:45:53,925 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-12 17:46:02,358 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 17:46:18,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1749380.0, ans=0.125 2024-08-12 17:46:23,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1749380.0, ans=0.0 2024-08-12 17:46:24,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1749380.0, ans=0.125 2024-08-12 17:46:38,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1749480.0, ans=0.2 2024-08-12 17:46:41,744 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1050, loss[loss=0.08649, beats_loss=0.01262, ecapa_loss=0.0001161, whisper_loss=0.07271, over 18901.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01085, ecapa_loss=0.0001677, whisper_loss=0.09019, over 3805464.94 frames. ], batch size: 70, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:46:47,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1749580.0, ans=0.0 2024-08-12 17:47:04,981 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.80 vs. limit=15.0 2024-08-12 17:47:07,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1749680.0, ans=0.1 2024-08-12 17:47:16,108 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 17:47:16,618 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-08-12 17:47:41,921 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 30 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 17:47:54,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1749980.0, ans=0.1 2024-08-12 17:47:57,854 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1100, loss[loss=0.1028, beats_loss=0.01288, ecapa_loss=0.000136, whisper_loss=0.08855, over 24404.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01079, ecapa_loss=0.000167, whisper_loss=0.09097, over 3825244.52 frames. ], batch size: 94, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:48:02,622 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-12 17:48:09,738 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 17:48:10,199 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.65 vs. limit=22.5 2024-08-12 17:48:24,788 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.582e+01 2.825e+01 3.154e+01 4.424e+01, threshold=5.651e+01, percent-clipped=0.0 2024-08-12 17:48:28,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1750280.0, ans=0.1 2024-08-12 17:48:36,727 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 17:48:44,506 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-12 17:48:49,106 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-12 17:48:50,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1750380.0, ans=0.125 2024-08-12 17:48:50,751 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.23 vs. limit=22.5 2024-08-12 17:48:59,593 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 27 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-12 17:49:23,394 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1150, loss[loss=0.1204, beats_loss=0.009466, ecapa_loss=0.0001681, whisper_loss=0.1093, over 19755.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01076, ecapa_loss=0.000167, whisper_loss=0.09178, over 3854226.88 frames. ], batch size: 75, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:49:24,492 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 17:49:43,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1750680.0, ans=0.5 2024-08-12 17:49:50,555 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-12 17:49:52,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1750680.0, ans=0.0 2024-08-12 17:50:00,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1750780.0, ans=0.125 2024-08-12 17:50:28,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1750880.0, ans=0.1 2024-08-12 17:50:33,678 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.05 vs. limit=12.0 2024-08-12 17:50:38,468 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 17:50:50,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1751080.0, ans=0.09899494936611666 2024-08-12 17:50:51,300 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1200, loss[loss=0.08273, beats_loss=0.009687, ecapa_loss=0.0002081, whisper_loss=0.07096, over 16481.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01078, ecapa_loss=0.0001672, whisper_loss=0.09138, over 3849129.84 frames. ], batch size: 69, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:50:57,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1751080.0, ans=0.04949747468305833 2024-08-12 17:51:06,448 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 17:51:27,987 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.378e+01 2.599e+01 3.054e+01 4.994e+01, threshold=5.199e+01, percent-clipped=0.0 2024-08-12 17:51:50,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1751280.0, ans=0.125 2024-08-12 17:52:28,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1751480.0, ans=0.1 2024-08-12 17:52:37,534 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1250, loss[loss=0.1064, beats_loss=0.01335, ecapa_loss=0.0001644, whisper_loss=0.09144, over 22247.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01086, ecapa_loss=0.0001667, whisper_loss=0.09095, over 3845074.87 frames. ], batch size: 88, lr: 4.93e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:53:11,804 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 17:53:26,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1751780.0, ans=0.125 2024-08-12 17:53:26,485 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 17:53:45,512 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.52 vs. limit=10.0 2024-08-12 17:53:51,238 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 17:53:56,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=1751880.0, ans=0.1 2024-08-12 17:54:01,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1751880.0, ans=0.1 2024-08-12 17:54:23,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1751980.0, ans=0.0 2024-08-12 17:54:27,511 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1300, loss[loss=0.1085, beats_loss=0.00982, ecapa_loss=0.0001724, whisper_loss=0.09691, over 16239.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01091, ecapa_loss=0.0001674, whisper_loss=0.09042, over 3832570.97 frames. ], batch size: 65, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:54:50,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1752180.0, ans=10.0 2024-08-12 17:55:04,431 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.28 vs. limit=22.5 2024-08-12 17:55:06,903 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.406e+01 2.650e+01 2.964e+01 4.612e+01, threshold=5.300e+01, percent-clipped=0.0 2024-08-12 17:55:08,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1752180.0, ans=0.1 2024-08-12 17:55:09,754 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 17:55:30,929 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-12 17:55:56,840 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 17:56:14,973 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1350, loss[loss=0.1155, beats_loss=0.01071, ecapa_loss=0.0001474, whisper_loss=0.1033, over 22979.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01097, ecapa_loss=0.0001667, whisper_loss=0.0899, over 3862666.92 frames. ], batch size: 89, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:56:21,704 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 14 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 17:56:29,897 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 17:56:41,139 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 39 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-12 17:56:49,272 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 39 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-12 17:56:51,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1752680.0, ans=0.0 2024-08-12 17:57:16,400 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-12 17:57:19,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1752880.0, ans=0.125 2024-08-12 17:57:26,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1752980.0, ans=0.0 2024-08-12 17:57:29,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1752980.0, ans=0.125 2024-08-12 17:57:36,490 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-12 17:57:38,114 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1400, loss[loss=0.1091, beats_loss=0.01093, ecapa_loss=0.000147, whisper_loss=0.0967, over 22930.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01094, ecapa_loss=0.0001669, whisper_loss=0.09047, over 3870442.12 frames. ], batch size: 91, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:57:42,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1753080.0, ans=0.2 2024-08-12 17:57:43,873 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 17:57:46,654 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 12 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 17:58:02,142 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 17:58:04,467 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.419e+01 2.702e+01 3.143e+01 2.017e+02, threshold=5.404e+01, percent-clipped=3.0 2024-08-12 17:58:15,646 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 17:59:02,995 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1450, loss[loss=0.1098, beats_loss=0.008805, ecapa_loss=0.0002029, whisper_loss=0.099, over 20007.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01088, ecapa_loss=0.0001674, whisper_loss=0.09034, over 3863352.08 frames. ], batch size: 83, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:59:03,255 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 17:59:07,427 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 19 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 17:59:13,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1753580.0, ans=0.125 2024-08-12 17:59:19,158 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 17:59:23,470 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 18 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 17:59:46,990 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 17:59:55,913 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=15.0 2024-08-12 18:00:00,269 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 18:00:01,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1753880.0, ans=0.2 2024-08-12 18:00:01,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1753880.0, ans=0.0 2024-08-12 18:00:21,805 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1500, loss[loss=0.07693, beats_loss=0.01342, ecapa_loss=0.000158, whisper_loss=0.06194, over 13494.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0109, ecapa_loss=0.0001682, whisper_loss=0.08956, over 3842952.94 frames. ], batch size: 54, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:00:24,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1754080.0, ans=0.125 2024-08-12 18:00:43,300 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 23 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 18:00:50,959 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.459e+01 2.780e+01 3.185e+01 5.902e+01, threshold=5.561e+01, percent-clipped=1.0 2024-08-12 18:01:10,591 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 18:01:12,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1754380.0, ans=0.125 2024-08-12 18:01:17,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1754380.0, ans=0.1 2024-08-12 18:01:35,391 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 13 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 18:01:35,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1754480.0, ans=0.125 2024-08-12 18:01:36,320 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-08-12 18:01:41,426 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1550, loss[loss=0.1089, beats_loss=0.008877, ecapa_loss=0.0001876, whisper_loss=0.0981, over 14251.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01086, ecapa_loss=0.0001675, whisper_loss=0.08944, over 3839766.77 frames. ], batch size: 56, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:01:55,611 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=12.0 2024-08-12 18:02:04,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1754680.0, ans=10.0 2024-08-12 18:02:15,060 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 18:02:44,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1754980.0, ans=0.125 2024-08-12 18:02:49,479 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-12 18:02:54,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1754980.0, ans=0.125 2024-08-12 18:02:58,254 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1600, loss[loss=0.08207, beats_loss=0.01385, ecapa_loss=0.0001623, whisper_loss=0.0666, over 21971.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01084, ecapa_loss=0.0001664, whisper_loss=0.09003, over 3841607.98 frames. ], batch size: 91, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:03:12,172 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 18:03:12,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1755180.0, ans=0.2 2024-08-12 18:03:25,478 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.499e+01 2.878e+01 3.295e+01 8.050e+01, threshold=5.757e+01, percent-clipped=1.0 2024-08-12 18:03:41,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1755380.0, ans=0.0 2024-08-12 18:03:44,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1755380.0, ans=0.125 2024-08-12 18:04:04,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1755480.0, ans=0.125 2024-08-12 18:04:14,022 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1650, loss[loss=0.1186, beats_loss=0.01012, ecapa_loss=0.0001515, whisper_loss=0.107, over 24126.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01082, ecapa_loss=0.0001661, whisper_loss=0.09042, over 3830352.68 frames. ], batch size: 91, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:04:28,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1755680.0, ans=0.2 2024-08-12 18:04:42,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1755680.0, ans=0.2 2024-08-12 18:05:03,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1755880.0, ans=0.125 2024-08-12 18:05:17,696 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 18:05:29,034 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1700, loss[loss=0.1195, beats_loss=0.009579, ecapa_loss=0.0001553, whisper_loss=0.1083, over 14657.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0108, ecapa_loss=0.0001657, whisper_loss=0.09116, over 3849087.61 frames. ], batch size: 55, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:05:38,617 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.75 vs. limit=12.0 2024-08-12 18:05:42,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1756080.0, ans=10.0 2024-08-12 18:05:42,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1756080.0, ans=0.125 2024-08-12 18:05:48,496 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-12 18:05:56,276 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.398e+01 2.715e+01 2.937e+01 4.103e+01, threshold=5.430e+01, percent-clipped=0.0 2024-08-12 18:05:59,803 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-12 18:06:01,985 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 18:06:14,695 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 18:06:28,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1756480.0, ans=0.125 2024-08-12 18:06:34,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1756480.0, ans=0.125 2024-08-12 18:06:42,277 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1750, loss[loss=0.09046, beats_loss=0.01277, ecapa_loss=0.0001649, whisper_loss=0.07604, over 22396.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01087, ecapa_loss=0.0001655, whisper_loss=0.0913, over 3858500.33 frames. ], batch size: 93, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:06:47,407 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.083e-02 2024-08-12 18:06:55,621 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 19 from LS+wenet, 26 from Vox, 47 fro AS 2024-08-12 18:07:03,394 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2024-08-12 18:07:05,448 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 41 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 18:07:17,069 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 18:07:17,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1756780.0, ans=0.125 2024-08-12 18:07:17,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1756780.0, ans=0.125 2024-08-12 18:07:29,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1756880.0, ans=0.0 2024-08-12 18:07:30,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1756880.0, ans=0.5 2024-08-12 18:07:49,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1756980.0, ans=0.0 2024-08-12 18:07:55,188 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1800, loss[loss=0.1023, beats_loss=0.009131, ecapa_loss=0.0001799, whisper_loss=0.09133, over 17731.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01074, ecapa_loss=0.0001661, whisper_loss=0.09188, over 3833245.70 frames. ], batch size: 66, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:08:08,504 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 18:08:20,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1757180.0, ans=0.0 2024-08-12 18:08:21,731 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.466e+01 2.734e+01 3.019e+01 6.645e+01, threshold=5.468e+01, percent-clipped=2.0 2024-08-12 18:08:24,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1757280.0, ans=0.0 2024-08-12 18:08:45,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1757380.0, ans=0.0 2024-08-12 18:08:54,570 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2024-08-12 18:09:05,788 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.35 vs. limit=6.0 2024-08-12 18:09:08,931 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1850, loss[loss=0.1116, beats_loss=0.01082, ecapa_loss=0.0001448, whisper_loss=0.09938, over 23233.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01081, ecapa_loss=0.0001666, whisper_loss=0.09161, over 3854347.72 frames. ], batch size: 91, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:09:15,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=8.0 2024-08-12 18:09:17,960 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 18:09:29,838 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=15.0 2024-08-12 18:09:31,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1757680.0, ans=0.125 2024-08-12 18:09:36,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1757780.0, ans=0.125 2024-08-12 18:10:00,604 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-12 18:10:08,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1757980.0, ans=0.1 2024-08-12 18:10:18,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1757980.0, ans=0.125 2024-08-12 18:10:20,661 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1900, loss[loss=0.1198, beats_loss=0.01231, ecapa_loss=0.000137, whisper_loss=0.1061, over 25061.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01089, ecapa_loss=0.000166, whisper_loss=0.09108, over 3845658.44 frames. ], batch size: 96, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:10:28,099 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 18:10:47,044 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.395e+01 2.725e+01 3.038e+01 6.504e+01, threshold=5.449e+01, percent-clipped=3.0 2024-08-12 18:10:48,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1758280.0, ans=0.0 2024-08-12 18:10:57,674 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 18:11:02,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1758280.0, ans=0.125 2024-08-12 18:11:05,048 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 18:11:19,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1758480.0, ans=0.125 2024-08-12 18:11:34,164 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 1950, loss[loss=0.0791, beats_loss=0.01119, ecapa_loss=0.0001724, whisper_loss=0.06618, over 20080.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01082, ecapa_loss=0.000166, whisper_loss=0.09148, over 3849900.48 frames. ], batch size: 81, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:11:37,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1758580.0, ans=0.125 2024-08-12 18:11:59,053 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.09 vs. limit=10.0 2024-08-12 18:12:00,569 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.52 vs. limit=10.0 2024-08-12 18:12:01,076 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 18:12:02,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1758780.0, ans=0.0 2024-08-12 18:12:06,628 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 18:12:10,155 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-12 18:12:29,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1758880.0, ans=0.0 2024-08-12 18:12:34,770 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.71 vs. limit=22.5 2024-08-12 18:12:37,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1758980.0, ans=0.1 2024-08-12 18:12:38,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1758980.0, ans=0.0 2024-08-12 18:12:44,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1758980.0, ans=0.0 2024-08-12 18:12:47,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1759080.0, ans=0.125 2024-08-12 18:12:48,050 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2000, loss[loss=0.0936, beats_loss=0.01388, ecapa_loss=0.0001291, whisper_loss=0.07843, over 17398.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01083, ecapa_loss=0.0001666, whisper_loss=0.0917, over 3839245.60 frames. ], batch size: 64, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:13:03,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1759180.0, ans=0.125 2024-08-12 18:13:03,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1759180.0, ans=0.0 2024-08-12 18:13:15,378 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.512e+01 2.812e+01 3.299e+01 5.299e+01, threshold=5.623e+01, percent-clipped=0.0 2024-08-12 18:13:19,006 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.98 vs. limit=22.5 2024-08-12 18:13:24,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1759280.0, ans=0.0 2024-08-12 18:13:26,261 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.51 vs. limit=12.0 2024-08-12 18:13:29,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1759280.0, ans=0.1 2024-08-12 18:13:36,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1759380.0, ans=0.1 2024-08-12 18:13:44,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1759380.0, ans=0.125 2024-08-12 18:14:01,993 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2050, loss[loss=0.09301, beats_loss=0.01181, ecapa_loss=0.0001436, whisper_loss=0.07976, over 22396.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01092, ecapa_loss=0.0001655, whisper_loss=0.09117, over 3859225.98 frames. ], batch size: 90, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:14:02,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1759580.0, ans=0.2 2024-08-12 18:14:07,078 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=12.0 2024-08-12 18:14:09,544 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-12 18:14:18,246 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 18:14:52,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1759880.0, ans=0.125 2024-08-12 18:15:18,218 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2100, loss[loss=0.09189, beats_loss=0.01054, ecapa_loss=0.0001807, whisper_loss=0.07954, over 20168.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01095, ecapa_loss=0.0001663, whisper_loss=0.09142, over 3861713.97 frames. ], batch size: 81, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:15:18,403 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 18:15:23,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1760080.0, ans=0.1 2024-08-12 18:15:42,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1760180.0, ans=0.2 2024-08-12 18:15:43,250 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.436e+01 2.700e+01 3.111e+01 5.079e+01, threshold=5.401e+01, percent-clipped=0.0 2024-08-12 18:15:50,207 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-12 18:15:56,463 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.84 vs. limit=15.0 2024-08-12 18:16:00,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1760380.0, ans=0.1 2024-08-12 18:16:30,193 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2150, loss[loss=0.09689, beats_loss=0.01058, ecapa_loss=0.0001406, whisper_loss=0.08491, over 15122.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01098, ecapa_loss=0.0001661, whisper_loss=0.09147, over 3841505.09 frames. ], batch size: 56, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:16:31,082 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2024-08-12 18:16:51,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1760680.0, ans=0.04949747468305833 2024-08-12 18:16:57,027 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 18:17:00,335 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.31 vs. limit=15.0 2024-08-12 18:17:08,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1760780.0, ans=0.125 2024-08-12 18:17:25,242 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 17 from LS+wenet, 27 from Vox, 46 fro AS 2024-08-12 18:17:37,447 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2200, loss[loss=0.1063, beats_loss=0.01044, ecapa_loss=0.0001521, whisper_loss=0.09434, over 17936.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.011, ecapa_loss=0.000167, whisper_loss=0.09119, over 3799133.98 frames. ], batch size: 71, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:17:38,857 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 18:18:00,856 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.455e+01 2.695e+01 3.002e+01 4.139e+01, threshold=5.389e+01, percent-clipped=0.0 2024-08-12 18:18:16,329 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 18:18:20,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1761380.0, ans=0.125 2024-08-12 18:18:30,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1761480.0, ans=0.125 2024-08-12 18:18:32,152 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 18:18:35,919 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 18:18:39,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1761480.0, ans=0.125 2024-08-12 18:18:42,697 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2250, loss[loss=0.1042, beats_loss=0.01082, ecapa_loss=0.0001925, whisper_loss=0.09147, over 19393.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01109, ecapa_loss=0.000168, whisper_loss=0.091, over 3795086.52 frames. ], batch size: 80, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:18:56,199 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.40 vs. limit=22.5 2024-08-12 18:19:11,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1761780.0, ans=0.0 2024-08-12 18:19:20,242 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 18:19:23,127 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=37.66 vs. limit=22.5 2024-08-12 18:19:34,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1761980.0, ans=0.0 2024-08-12 18:19:36,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1761980.0, ans=0.125 2024-08-12 18:19:37,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1761980.0, ans=0.1 2024-08-12 18:19:42,930 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.54 vs. limit=22.5 2024-08-12 18:19:47,219 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2300, loss[loss=0.1284, beats_loss=0.008703, ecapa_loss=0.0002019, whisper_loss=0.1177, over 22849.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01111, ecapa_loss=0.0001692, whisper_loss=0.09127, over 3859374.41 frames. ], batch size: 93, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:19:47,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1762080.0, ans=0.0 2024-08-12 18:19:58,783 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-12 18:20:05,150 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.53 vs. limit=10.0 2024-08-12 18:20:05,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1762180.0, ans=0.0 2024-08-12 18:20:10,980 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.458e+01 2.734e+01 3.155e+01 5.696e+01, threshold=5.468e+01, percent-clipped=1.0 2024-08-12 18:20:11,215 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-12 18:20:17,819 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 18:20:35,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1762380.0, ans=22.5 2024-08-12 18:20:39,658 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=12.0 2024-08-12 18:20:48,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1762480.0, ans=0.2 2024-08-12 18:20:52,981 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2350, loss[loss=0.09639, beats_loss=0.01188, ecapa_loss=0.0001687, whisper_loss=0.08283, over 21105.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01099, ecapa_loss=0.0001695, whisper_loss=0.09224, over 3826333.94 frames. ], batch size: 87, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:20:56,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1762580.0, ans=0.04949747468305833 2024-08-12 18:21:00,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1762580.0, ans=0.125 2024-08-12 18:21:03,782 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 18:21:22,002 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 18:21:31,220 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 18:21:36,639 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 18:21:38,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1762880.0, ans=0.125 2024-08-12 18:21:49,665 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 25 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-12 18:21:58,474 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2400, loss[loss=0.1023, beats_loss=0.01075, ecapa_loss=0.0001453, whisper_loss=0.09007, over 23575.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01088, ecapa_loss=0.0001691, whisper_loss=0.09233, over 3841272.46 frames. ], batch size: 92, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:22:02,838 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 12 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 18:22:03,903 WARNING [optim.py:496] (2/4) Scaling gradients by 0.05874495208263397, model_norm_threshold=54.68092727661133 2024-08-12 18:22:04,084 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.98, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.484e+05, grad_sumsq=9.566e+04, orig_rms_sq=8.869e+00 2024-08-12 18:22:22,663 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.513e+01 2.845e+01 3.166e+01 9.308e+02, threshold=5.690e+01, percent-clipped=1.0 2024-08-12 18:22:36,143 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 14 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 18:22:45,103 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-12 18:22:58,048 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-12 18:23:04,452 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2450, loss[loss=0.0819, beats_loss=0.01412, ecapa_loss=0.0001823, whisper_loss=0.06596, over 18423.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01092, ecapa_loss=0.0001694, whisper_loss=0.09147, over 3820340.54 frames. ], batch size: 79, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:23:05,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1763580.0, ans=0.125 2024-08-12 18:23:05,279 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.11 vs. limit=10.0 2024-08-12 18:23:05,939 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 18:23:07,053 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-12 18:23:16,723 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 18:23:16,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1763680.0, ans=0.125 2024-08-12 18:23:33,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1763780.0, ans=0.05 2024-08-12 18:23:41,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1763780.0, ans=0.125 2024-08-12 18:23:45,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1763880.0, ans=0.1 2024-08-12 18:23:55,776 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 18:24:09,635 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2500, loss[loss=0.09372, beats_loss=0.01267, ecapa_loss=0.0001508, whisper_loss=0.07955, over 21225.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01091, ecapa_loss=0.0001691, whisper_loss=0.09147, over 3844789.12 frames. ], batch size: 84, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:24:32,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1764180.0, ans=0.0 2024-08-12 18:24:32,898 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.522e+01 2.839e+01 3.431e+01 9.983e+01, threshold=5.678e+01, percent-clipped=1.0 2024-08-12 18:24:39,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1764280.0, ans=0.125 2024-08-12 18:24:46,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1764280.0, ans=0.125 2024-08-12 18:24:46,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1764280.0, ans=0.2 2024-08-12 18:24:55,443 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 18:24:58,709 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.52 vs. limit=12.0 2024-08-12 18:25:14,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1764580.0, ans=0.125 2024-08-12 18:25:15,293 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2550, loss[loss=0.1165, beats_loss=0.01058, ecapa_loss=0.0001388, whisper_loss=0.1045, over 20013.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01089, ecapa_loss=0.0001704, whisper_loss=0.09151, over 3841905.63 frames. ], batch size: 74, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:25:24,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1764580.0, ans=0.0 2024-08-12 18:25:26,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1764580.0, ans=0.125 2024-08-12 18:25:32,656 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 33 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 18:25:36,938 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 18:25:39,367 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 18:25:43,836 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=9.257e-01 2024-08-12 18:25:44,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1764780.0, ans=0.2 2024-08-12 18:25:47,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1764780.0, ans=0.125 2024-08-12 18:25:52,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1764780.0, ans=0.04949747468305833 2024-08-12 18:26:13,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1764980.0, ans=0.125 2024-08-12 18:26:18,166 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 18:26:20,705 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2600, loss[loss=0.07593, beats_loss=0.01333, ecapa_loss=0.0001658, whisper_loss=0.06094, over 18528.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01086, ecapa_loss=0.0001704, whisper_loss=0.09203, over 3852583.17 frames. ], batch size: 77, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:26:21,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1765080.0, ans=0.0 2024-08-12 18:26:24,478 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 18:26:43,704 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.522e+01 2.874e+01 3.178e+01 1.791e+02, threshold=5.747e+01, percent-clipped=2.0 2024-08-12 18:26:54,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1765280.0, ans=0.2 2024-08-12 18:27:09,844 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-12 18:27:20,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1765480.0, ans=0.0 2024-08-12 18:27:25,519 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2650, loss[loss=0.08598, beats_loss=0.01126, ecapa_loss=0.0001759, whisper_loss=0.07296, over 19981.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01083, ecapa_loss=0.000173, whisper_loss=0.092, over 3848084.88 frames. ], batch size: 84, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:27:25,765 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 18:27:40,113 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-12 18:27:47,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1765680.0, ans=0.0 2024-08-12 18:27:53,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1765780.0, ans=0.1 2024-08-12 18:28:00,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1765780.0, ans=0.125 2024-08-12 18:28:01,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1765780.0, ans=0.125 2024-08-12 18:28:03,181 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-08-12 18:28:04,191 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.234e-03 2024-08-12 18:28:18,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1765980.0, ans=0.1 2024-08-12 18:28:26,481 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-12 18:28:31,395 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2700, loss[loss=0.1018, beats_loss=0.01261, ecapa_loss=0.0001314, whisper_loss=0.08788, over 23105.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01084, ecapa_loss=0.0001724, whisper_loss=0.09211, over 3842610.19 frames. ], batch size: 88, lr: 4.90e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:28:36,975 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 18:28:37,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1766080.0, ans=0.125 2024-08-12 18:28:43,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1766180.0, ans=0.0 2024-08-12 18:28:48,403 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 18:28:54,731 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.345e+01 2.624e+01 3.036e+01 4.476e+01, threshold=5.248e+01, percent-clipped=0.0 2024-08-12 18:29:06,320 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.31 vs. limit=15.0 2024-08-12 18:29:10,710 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 18:29:18,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1766380.0, ans=0.0 2024-08-12 18:29:20,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1766380.0, ans=0.1 2024-08-12 18:29:36,533 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2750, loss[loss=0.112, beats_loss=0.01083, ecapa_loss=0.0002183, whisper_loss=0.09896, over 22147.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01085, ecapa_loss=0.0001727, whisper_loss=0.09235, over 3831557.40 frames. ], batch size: 95, lr: 4.90e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:29:36,965 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-12 18:29:52,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1766680.0, ans=0.125 2024-08-12 18:30:10,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1766780.0, ans=0.125 2024-08-12 18:30:15,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1766880.0, ans=0.125 2024-08-12 18:30:17,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1766880.0, ans=0.035 2024-08-12 18:30:27,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1766880.0, ans=0.0 2024-08-12 18:30:42,382 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2800, loss[loss=0.09753, beats_loss=0.01011, ecapa_loss=0.0001667, whisper_loss=0.08575, over 17190.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01092, ecapa_loss=0.0001721, whisper_loss=0.09255, over 3847773.58 frames. ], batch size: 70, lr: 4.90e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:30:52,806 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 17 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 18:31:02,461 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 24 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-12 18:31:06,289 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.517e+01 2.667e+01 2.964e+01 5.320e+01, threshold=5.335e+01, percent-clipped=1.0 2024-08-12 18:31:29,854 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2024-08-12 18:31:30,440 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 14 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 18:31:39,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1767480.0, ans=0.0 2024-08-12 18:31:41,192 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 18:31:41,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1767480.0, ans=0.1 2024-08-12 18:31:45,157 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 18:31:48,885 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2850, loss[loss=0.1077, beats_loss=0.007609, ecapa_loss=0.0001888, whisper_loss=0.09819, over 14237.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01094, ecapa_loss=0.0001714, whisper_loss=0.09249, over 3816142.07 frames. ], batch size: 55, lr: 4.90e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:31:55,280 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 18:31:59,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1767580.0, ans=0.125 2024-08-12 18:32:08,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1767680.0, ans=0.0 2024-08-12 18:32:16,093 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 18:32:28,158 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 18:32:28,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1767880.0, ans=0.125 2024-08-12 18:32:38,374 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-12 18:32:47,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1767980.0, ans=0.1 2024-08-12 18:32:51,646 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-12 18:32:53,801 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2900, loss[loss=0.0898, beats_loss=0.009961, ecapa_loss=0.0002151, whisper_loss=0.07768, over 18896.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01095, ecapa_loss=0.0001725, whisper_loss=0.09162, over 3841074.80 frames. ], batch size: 81, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:32:56,862 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.918e-01 2024-08-12 18:33:17,862 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 18:33:19,007 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.482e+01 2.869e+01 3.422e+01 8.599e+01, threshold=5.738e+01, percent-clipped=1.0 2024-08-12 18:33:43,589 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 18:33:50,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-12 18:34:00,688 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 2950, loss[loss=0.1046, beats_loss=0.01067, ecapa_loss=0.000181, whisper_loss=0.09216, over 18798.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01091, ecapa_loss=0.0001731, whisper_loss=0.09179, over 3829989.72 frames. ], batch size: 77, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:34:12,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1768680.0, ans=0.125 2024-08-12 18:34:45,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1768880.0, ans=0.0 2024-08-12 18:34:45,714 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=15.0 2024-08-12 18:34:57,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1768980.0, ans=0.2 2024-08-12 18:35:10,460 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3000, loss[loss=0.08347, beats_loss=0.01267, ecapa_loss=0.0002039, whisper_loss=0.06876, over 17186.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01089, ecapa_loss=0.0001733, whisper_loss=0.09199, over 3878201.86 frames. ], batch size: 75, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:35:10,460 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 18:35:46,386 INFO [train_multi_KD3.py:1149] (2/4) Epoch 13, validation on ASR_libri: loss=0.2551, beats_loss=0, ecapa_loss=0.0005879, whisper_loss=0.2492, over 922467.00 frames. 2024-08-12 18:36:04,768 INFO [train_multi_KD3.py:1149] (2/4) Epoch 13, validation on SV_voxceleb1: loss=0.004639, beats_loss=0, ecapa_loss=0.0004639, whisper_loss=0, over 939242.00 frames. 2024-08-12 18:36:55,605 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.7069, 3.1647, 2.4976, 1.9259], device='cuda:2') 2024-08-12 18:37:53,603 INFO [train_multi_KD3.py:1149] (2/4) Epoch 13, validation on AT_audioset: loss=0.02413, beats_loss=0.02413, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 18:37:53,607 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 18:37:54,270 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.94 vs. limit=15.0 2024-08-12 18:37:59,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1769080.0, ans=0.025 2024-08-12 18:38:10,526 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 18:38:18,241 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.438e+01 2.713e+01 3.016e+01 4.001e+01, threshold=5.426e+01, percent-clipped=0.0 2024-08-12 18:38:25,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1769280.0, ans=0.1 2024-08-12 18:38:25,962 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.19 vs. limit=22.5 2024-08-12 18:38:31,006 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=18.50 vs. limit=15.0 2024-08-12 18:38:32,240 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.11 vs. limit=10.0 2024-08-12 18:38:35,705 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 18:38:40,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1769380.0, ans=0.125 2024-08-12 18:38:41,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1769380.0, ans=0.125 2024-08-12 18:38:48,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1769480.0, ans=0.125 2024-08-12 18:38:56,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1769480.0, ans=0.125 2024-08-12 18:38:59,970 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3050, loss[loss=0.08268, beats_loss=0.009735, ecapa_loss=0.0002514, whisper_loss=0.07043, over 17165.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.0001721, whisper_loss=0.09172, over 3892472.50 frames. ], batch size: 71, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:39:01,794 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 38 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-12 18:39:02,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1769580.0, ans=0.0 2024-08-12 18:39:05,772 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-12 18:39:11,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1769580.0, ans=0.5 2024-08-12 18:39:14,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1769680.0, ans=0.2 2024-08-12 18:39:15,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1769680.0, ans=0.04949747468305833 2024-08-12 18:39:25,130 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 30 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 18:39:36,664 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.84 vs. limit=10.0 2024-08-12 18:39:49,778 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-12 18:39:51,449 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 24 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-12 18:39:58,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1769980.0, ans=0.1 2024-08-12 18:40:09,374 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3100, loss[loss=0.1208, beats_loss=0.0108, ecapa_loss=0.0001513, whisper_loss=0.1085, over 22210.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01095, ecapa_loss=0.0001723, whisper_loss=0.0924, over 3917694.79 frames. ], batch size: 84, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:40:22,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1770180.0, ans=0.125 2024-08-12 18:40:27,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1770180.0, ans=0.0 2024-08-12 18:40:34,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1770180.0, ans=0.125 2024-08-12 18:40:36,329 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.497e+01 2.868e+01 3.286e+01 7.289e+01, threshold=5.737e+01, percent-clipped=2.0 2024-08-12 18:40:52,038 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 18:41:03,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1770380.0, ans=0.0 2024-08-12 18:41:19,075 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2024-08-12 18:41:21,132 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3150, loss[loss=0.1113, beats_loss=0.0107, ecapa_loss=0.0001577, whisper_loss=0.09903, over 17374.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.011, ecapa_loss=0.0001718, whisper_loss=0.09266, over 3884676.10 frames. ], batch size: 69, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:41:34,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1770680.0, ans=0.125 2024-08-12 18:41:40,735 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 18:41:48,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1770780.0, ans=0.0 2024-08-12 18:41:51,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1770780.0, ans=0.125 2024-08-12 18:41:57,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1770780.0, ans=0.0 2024-08-12 18:42:15,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1770880.0, ans=0.2 2024-08-12 18:42:21,745 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 18:42:29,001 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-12 18:42:31,558 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 18:42:33,390 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:42:34,311 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3200, loss[loss=0.1203, beats_loss=0.01117, ecapa_loss=0.0001558, whisper_loss=0.1075, over 22455.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01099, ecapa_loss=0.0001717, whisper_loss=0.09302, over 3855460.41 frames. ], batch size: 87, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:42:37,104 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 18:42:45,214 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.884e+00 2024-08-12 18:42:56,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1771180.0, ans=0.0 2024-08-12 18:43:00,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1771180.0, ans=0.5 2024-08-12 18:43:02,836 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.434e+01 2.699e+01 3.191e+01 8.641e+01, threshold=5.397e+01, percent-clipped=3.0 2024-08-12 18:43:13,214 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 18:43:46,065 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3250, loss[loss=0.113, beats_loss=0.009239, ecapa_loss=0.0001964, whisper_loss=0.1018, over 21623.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01093, ecapa_loss=0.000172, whisper_loss=0.09313, over 3845859.12 frames. ], batch size: 87, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:43:50,885 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-12 18:43:54,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1771580.0, ans=0.125 2024-08-12 18:44:08,639 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 20 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-12 18:44:56,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1771980.0, ans=0.125 2024-08-12 18:44:58,866 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3300, loss[loss=0.1032, beats_loss=0.007095, ecapa_loss=0.0001927, whisper_loss=0.09417, over 16759.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.011, ecapa_loss=0.0001731, whisper_loss=0.09208, over 3861997.18 frames. ], batch size: 66, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:45:24,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1772180.0, ans=0.04949747468305833 2024-08-12 18:45:26,625 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.521e+01 2.800e+01 3.274e+01 5.621e+01, threshold=5.601e+01, percent-clipped=1.0 2024-08-12 18:45:26,823 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 18:45:28,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1772280.0, ans=0.2 2024-08-12 18:45:36,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1772280.0, ans=0.125 2024-08-12 18:45:40,254 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=12.0 2024-08-12 18:45:46,793 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 18:45:47,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1772380.0, ans=0.125 2024-08-12 18:45:55,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1772480.0, ans=0.0 2024-08-12 18:46:11,238 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3350, loss[loss=0.09427, beats_loss=0.01176, ecapa_loss=0.0001567, whisper_loss=0.08094, over 17176.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01099, ecapa_loss=0.000173, whisper_loss=0.09197, over 3862881.00 frames. ], batch size: 68, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:46:12,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1772580.0, ans=10.0 2024-08-12 18:46:40,978 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-12 18:46:49,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1772780.0, ans=0.125 2024-08-12 18:47:10,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1772980.0, ans=0.125 2024-08-12 18:47:14,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1772980.0, ans=0.125 2024-08-12 18:47:22,793 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3400, loss[loss=0.1355, beats_loss=0.00928, ecapa_loss=0.0001559, whisper_loss=0.1247, over 18162.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01101, ecapa_loss=0.0001716, whisper_loss=0.09161, over 3882865.05 frames. ], batch size: 68, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:47:33,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1773080.0, ans=0.125 2024-08-12 18:47:37,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.02 vs. limit=12.0 2024-08-12 18:47:48,006 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 18:47:50,752 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.407e+01 2.669e+01 3.067e+01 7.735e+01, threshold=5.339e+01, percent-clipped=1.0 2024-08-12 18:47:55,704 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2024-08-12 18:47:58,194 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 18:47:58,804 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2024-08-12 18:47:59,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1773280.0, ans=0.2 2024-08-12 18:48:08,024 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 18:48:34,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1773480.0, ans=0.0 2024-08-12 18:48:36,434 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-08-12 18:48:36,963 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3450, loss[loss=0.1283, beats_loss=0.009089, ecapa_loss=0.0001893, whisper_loss=0.1173, over 21673.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01101, ecapa_loss=0.0001721, whisper_loss=0.09087, over 3867963.54 frames. ], batch size: 85, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:48:39,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1773580.0, ans=0.125 2024-08-12 18:48:56,131 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=12.0 2024-08-12 18:48:57,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1773680.0, ans=0.125 2024-08-12 18:48:59,487 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 11 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 18:49:04,067 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 34 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 18:49:09,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1773780.0, ans=0.5 2024-08-12 18:49:27,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1773880.0, ans=0.2 2024-08-12 18:49:32,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1773980.0, ans=0.125 2024-08-12 18:49:38,607 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:49:39,466 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 19 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 18:49:47,615 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3500, loss[loss=0.1104, beats_loss=0.01179, ecapa_loss=0.000154, whisper_loss=0.09707, over 20945.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01099, ecapa_loss=0.0001738, whisper_loss=0.09123, over 3851628.27 frames. ], batch size: 80, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:49:51,576 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 18:50:09,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1774180.0, ans=0.0 2024-08-12 18:50:14,600 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.564e+01 2.746e+01 3.042e+01 5.198e+01, threshold=5.491e+01, percent-clipped=0.0 2024-08-12 18:50:20,337 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 18:50:32,069 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.23 vs. limit=10.0 2024-08-12 18:50:35,550 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 18:50:55,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1774480.0, ans=0.0 2024-08-12 18:50:58,440 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3550, loss[loss=0.0953, beats_loss=0.009297, ecapa_loss=0.0001523, whisper_loss=0.08448, over 21392.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01106, ecapa_loss=0.000172, whisper_loss=0.09057, over 3856260.33 frames. ], batch size: 78, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:51:03,202 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-12 18:51:14,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1774680.0, ans=0.1 2024-08-12 18:51:29,776 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.88 vs. limit=15.0 2024-08-12 18:51:57,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1774980.0, ans=0.1 2024-08-12 18:52:10,173 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 19 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-12 18:52:11,214 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3600, loss[loss=0.08088, beats_loss=0.01426, ecapa_loss=0.0001951, whisper_loss=0.06467, over 20127.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0109, ecapa_loss=0.0001744, whisper_loss=0.0921, over 3853762.65 frames. ], batch size: 88, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:52:23,625 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 18:52:26,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1775180.0, ans=0.125 2024-08-12 18:52:31,018 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 18:52:36,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1775180.0, ans=0.125 2024-08-12 18:52:37,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1775180.0, ans=0.2 2024-08-12 18:52:38,743 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.432e+01 2.743e+01 3.098e+01 5.002e+01, threshold=5.485e+01, percent-clipped=0.0 2024-08-12 18:52:40,062 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 18:52:52,507 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 18:53:00,639 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 18:53:02,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1775380.0, ans=0.125 2024-08-12 18:53:08,914 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 18:53:23,525 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3650, loss[loss=0.1102, beats_loss=0.01023, ecapa_loss=0.000201, whisper_loss=0.09791, over 21121.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01097, ecapa_loss=0.000174, whisper_loss=0.09168, over 3852289.50 frames. ], batch size: 89, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:53:30,771 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 18:53:32,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1775580.0, ans=0.125 2024-08-12 18:53:37,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1775680.0, ans=0.0 2024-08-12 18:53:50,436 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 29 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 18:53:59,581 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 18:54:01,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1775780.0, ans=0.0 2024-08-12 18:54:33,896 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.08 vs. limit=22.5 2024-08-12 18:54:36,026 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3700, loss[loss=0.1125, beats_loss=0.01207, ecapa_loss=0.0001601, whisper_loss=0.09887, over 23460.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0109, ecapa_loss=0.0001738, whisper_loss=0.09268, over 3861827.43 frames. ], batch size: 92, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:54:43,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1776080.0, ans=0.125 2024-08-12 18:54:49,220 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 18:55:00,802 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:55:03,068 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.392e+01 2.654e+01 3.110e+01 5.350e+01, threshold=5.308e+01, percent-clipped=0.0 2024-08-12 18:55:06,676 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-12 18:55:25,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1776380.0, ans=0.0 2024-08-12 18:55:31,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1776380.0, ans=0.125 2024-08-12 18:55:35,357 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 28 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 18:55:37,251 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2024-08-12 18:55:38,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1776480.0, ans=0.0 2024-08-12 18:55:44,429 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 18:55:46,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1776480.0, ans=0.04949747468305833 2024-08-12 18:55:48,288 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3750, loss[loss=0.1182, beats_loss=0.009325, ecapa_loss=0.0001759, whisper_loss=0.1071, over 19187.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.0109, ecapa_loss=0.000173, whisper_loss=0.09284, over 3861322.63 frames. ], batch size: 72, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:55:53,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1776580.0, ans=0.125 2024-08-12 18:55:56,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1776580.0, ans=0.04949747468305833 2024-08-12 18:56:04,504 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 18:56:11,542 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-08-12 18:56:15,613 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 18:56:19,842 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 18:56:23,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1776780.0, ans=0.1 2024-08-12 18:56:31,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1776780.0, ans=0.1 2024-08-12 18:56:38,042 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-12 18:56:57,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2024-08-12 18:57:03,292 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3800, loss[loss=0.0759, beats_loss=0.01521, ecapa_loss=0.0001886, whisper_loss=0.0588, over 21241.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01092, ecapa_loss=0.0001735, whisper_loss=0.09295, over 3872804.92 frames. ], batch size: 94, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:57:22,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1777180.0, ans=0.0 2024-08-12 18:57:31,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.486e+01 2.799e+01 3.183e+01 6.177e+01, threshold=5.598e+01, percent-clipped=1.0 2024-08-12 18:57:33,439 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 18:57:59,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1777380.0, ans=10.0 2024-08-12 18:58:04,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1777480.0, ans=0.125 2024-08-12 18:58:09,761 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.76 vs. limit=15.0 2024-08-12 18:58:19,768 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3850, loss[loss=0.09813, beats_loss=0.008937, ecapa_loss=0.0002232, whisper_loss=0.08696, over 19206.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01091, ecapa_loss=0.0001729, whisper_loss=0.09324, over 3903747.65 frames. ], batch size: 79, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:58:51,068 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 18:58:57,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1777780.0, ans=0.0 2024-08-12 18:59:15,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1777880.0, ans=0.125 2024-08-12 18:59:19,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1777880.0, ans=0.07 2024-08-12 18:59:22,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1777980.0, ans=0.0 2024-08-12 18:59:25,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1777980.0, ans=0.125 2024-08-12 18:59:33,824 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 18:59:36,372 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3900, loss[loss=0.1032, beats_loss=0.01067, ecapa_loss=0.0002005, whisper_loss=0.09051, over 13270.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.0109, ecapa_loss=0.0001743, whisper_loss=0.09346, over 3898483.18 frames. ], batch size: 57, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:59:36,800 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:59:55,017 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 32 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-12 19:00:03,039 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.890e-01 2024-08-12 19:00:05,258 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.460e+01 2.720e+01 3.134e+01 5.284e+01, threshold=5.440e+01, percent-clipped=0.0 2024-08-12 19:00:12,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1778280.0, ans=0.125 2024-08-12 19:00:14,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1778280.0, ans=0.125 2024-08-12 19:00:16,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1778280.0, ans=0.125 2024-08-12 19:00:16,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1778280.0, ans=0.125 2024-08-12 19:00:31,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1778380.0, ans=0.2 2024-08-12 19:00:32,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1778380.0, ans=0.1 2024-08-12 19:00:33,432 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-12 19:00:35,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1778380.0, ans=0.025 2024-08-12 19:00:53,600 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 3950, loss[loss=0.1148, beats_loss=0.007598, ecapa_loss=0.0001764, whisper_loss=0.1054, over 17205.00 frames. ], tot_loss[loss=0.107, beats_loss=0.0108, ecapa_loss=0.0001747, whisper_loss=0.09446, over 3884793.06 frames. ], batch size: 68, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:00:56,338 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.77 vs. limit=5.0 2024-08-12 19:01:18,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1778680.0, ans=0.125 2024-08-12 19:01:29,364 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 19:01:51,041 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 19:02:08,793 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4000, loss[loss=0.1163, beats_loss=0.01029, ecapa_loss=0.0001612, whisper_loss=0.1044, over 22481.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01083, ecapa_loss=0.0001741, whisper_loss=0.09402, over 3905665.03 frames. ], batch size: 88, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:02:11,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1779080.0, ans=0.125 2024-08-12 19:02:23,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1779180.0, ans=0.1 2024-08-12 19:02:37,277 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 19:02:39,532 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.414e+01 2.670e+01 2.988e+01 4.666e+01, threshold=5.339e+01, percent-clipped=0.0 2024-08-12 19:02:44,384 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 19:03:00,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1779380.0, ans=0.125 2024-08-12 19:03:03,315 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2024-08-12 19:03:05,122 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.76 vs. limit=22.5 2024-08-12 19:03:05,947 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 19:03:22,690 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 20 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 19:03:26,340 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.161e-02 2024-08-12 19:03:29,167 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4050, loss[loss=0.1075, beats_loss=0.01079, ecapa_loss=0.0001505, whisper_loss=0.09523, over 22253.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01083, ecapa_loss=0.0001749, whisper_loss=0.0946, over 3908373.13 frames. ], batch size: 86, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:03:29,311 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 37 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 19:03:35,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1779580.0, ans=0.0 2024-08-12 19:03:35,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1779580.0, ans=0.0 2024-08-12 19:03:45,866 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 19:03:50,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1779680.0, ans=0.125 2024-08-12 19:04:00,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1779780.0, ans=0.0 2024-08-12 19:04:11,823 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.360e-02 2024-08-12 19:04:30,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1779980.0, ans=0.0 2024-08-12 19:04:34,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1779980.0, ans=0.125 2024-08-12 19:04:34,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1779980.0, ans=0.0 2024-08-12 19:04:43,184 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 19:04:45,667 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2024-08-12 19:04:48,255 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4100, loss[loss=0.09528, beats_loss=0.01066, ecapa_loss=0.0002019, whisper_loss=0.0826, over 22176.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01076, ecapa_loss=0.0001751, whisper_loss=0.09484, over 3897804.10 frames. ], batch size: 93, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:04:48,378 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 19:05:01,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1780080.0, ans=0.025 2024-08-12 19:05:16,935 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.487e+01 2.905e+01 3.188e+01 5.523e+01, threshold=5.810e+01, percent-clipped=1.0 2024-08-12 19:05:17,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1780280.0, ans=0.125 2024-08-12 19:05:22,166 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 32 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 19:05:27,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1780280.0, ans=0.125 2024-08-12 19:05:28,272 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-12 19:05:33,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1780380.0, ans=0.07 2024-08-12 19:05:39,271 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-12 19:05:48,452 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 19:06:03,917 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 19:06:07,553 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4150, loss[loss=0.1122, beats_loss=0.01084, ecapa_loss=0.0001836, whisper_loss=0.09955, over 23564.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.0108, ecapa_loss=0.0001763, whisper_loss=0.09509, over 3903345.37 frames. ], batch size: 95, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:06:11,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1780580.0, ans=0.125 2024-08-12 19:06:15,116 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 19:06:28,652 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2024-08-12 19:06:34,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1780680.0, ans=0.125 2024-08-12 19:06:44,153 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 19:06:56,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1780880.0, ans=0.0 2024-08-12 19:07:26,782 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4200, loss[loss=0.1228, beats_loss=0.008861, ecapa_loss=0.0001595, whisper_loss=0.1124, over 23536.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01087, ecapa_loss=0.0001747, whisper_loss=0.09419, over 3902909.24 frames. ], batch size: 92, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:07:28,407 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 19:07:33,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1781080.0, ans=0.2 2024-08-12 19:07:50,855 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 19:07:56,119 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.440e+01 2.909e+01 3.594e+01 1.116e+02, threshold=5.819e+01, percent-clipped=3.0 2024-08-12 19:07:57,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1781280.0, ans=0.0 2024-08-12 19:08:03,846 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2024-08-12 19:08:07,991 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-12 19:08:09,498 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 19:08:49,014 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4250, loss[loss=0.121, beats_loss=0.008705, ecapa_loss=0.0001423, whisper_loss=0.1109, over 14821.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01094, ecapa_loss=0.0001735, whisper_loss=0.09308, over 3890789.97 frames. ], batch size: 54, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:09:01,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1781580.0, ans=0.125 2024-08-12 19:09:48,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1781880.0, ans=0.125 2024-08-12 19:09:58,937 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 19:09:59,753 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.03 vs. limit=22.5 2024-08-12 19:10:05,883 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-08-12 19:10:07,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1782080.0, ans=0.125 2024-08-12 19:10:08,386 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4300, loss[loss=0.1261, beats_loss=0.008701, ecapa_loss=0.0001807, whisper_loss=0.1156, over 15336.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01099, ecapa_loss=0.0001719, whisper_loss=0.0923, over 3909752.86 frames. ], batch size: 59, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:10:11,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1782080.0, ans=0.0 2024-08-12 19:10:29,160 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.54 vs. limit=22.5 2024-08-12 19:10:37,763 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.369e+01 2.676e+01 2.998e+01 4.612e+01, threshold=5.352e+01, percent-clipped=0.0 2024-08-12 19:10:57,961 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 19:11:22,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1782480.0, ans=0.04949747468305833 2024-08-12 19:11:23,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1782480.0, ans=0.125 2024-08-12 19:11:27,231 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4350, loss[loss=0.09157, beats_loss=0.01473, ecapa_loss=0.0001329, whisper_loss=0.07551, over 15707.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01091, ecapa_loss=0.0001739, whisper_loss=0.09222, over 3867354.79 frames. ], batch size: 63, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:11:42,664 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 18 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 19:11:52,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1782680.0, ans=0.125 2024-08-12 19:11:54,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1782680.0, ans=0.0 2024-08-12 19:12:41,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1782980.0, ans=0.07 2024-08-12 19:12:49,819 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4400, loss[loss=0.1111, beats_loss=0.01068, ecapa_loss=0.0001757, whisper_loss=0.09871, over 23387.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01087, ecapa_loss=0.000173, whisper_loss=0.09261, over 3877611.70 frames. ], batch size: 91, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:13:12,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1783180.0, ans=0.125 2024-08-12 19:13:12,774 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2024-08-12 19:13:13,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1783180.0, ans=0.1 2024-08-12 19:13:16,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1783180.0, ans=0.125 2024-08-12 19:13:21,574 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.411e+01 2.660e+01 2.962e+01 4.713e+01, threshold=5.320e+01, percent-clipped=0.0 2024-08-12 19:13:22,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1783280.0, ans=0.125 2024-08-12 19:13:40,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1783380.0, ans=0.1 2024-08-12 19:13:40,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1783380.0, ans=0.0 2024-08-12 19:14:13,483 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4450, loss[loss=0.09987, beats_loss=0.0112, ecapa_loss=0.0001486, whisper_loss=0.08719, over 20097.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0109, ecapa_loss=0.000172, whisper_loss=0.09226, over 3889929.87 frames. ], batch size: 76, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:14:17,354 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-12 19:14:22,931 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2024-08-12 19:14:34,911 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 19:14:35,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1783680.0, ans=0.125 2024-08-12 19:15:07,198 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 18 from LS+wenet, 21 from Vox, 53 fro AS 2024-08-12 19:15:19,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1783880.0, ans=0.125 2024-08-12 19:15:26,558 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 19:15:36,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1783980.0, ans=0.2 2024-08-12 19:15:41,545 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4500, loss[loss=0.1084, beats_loss=0.01092, ecapa_loss=0.0001667, whisper_loss=0.09585, over 14215.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0109, ecapa_loss=0.0001718, whisper_loss=0.09178, over 3886111.38 frames. ], batch size: 55, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:15:52,495 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-12 19:16:07,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1784180.0, ans=0.5 2024-08-12 19:16:10,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1784180.0, ans=0.125 2024-08-12 19:16:13,382 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.482e+01 2.920e+01 3.537e+01 6.104e+01, threshold=5.841e+01, percent-clipped=3.0 2024-08-12 19:17:07,607 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4550, loss[loss=0.0998, beats_loss=0.01103, ecapa_loss=0.0002059, whisper_loss=0.08671, over 20744.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.0001718, whisper_loss=0.09156, over 3872773.67 frames. ], batch size: 88, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:17:27,168 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 19:17:35,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1784680.0, ans=0.125 2024-08-12 19:17:37,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1784680.0, ans=0.1 2024-08-12 19:17:38,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1784680.0, ans=0.0 2024-08-12 19:17:48,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1784780.0, ans=0.1 2024-08-12 19:17:53,059 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 19:17:54,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1784780.0, ans=0.015 2024-08-12 19:18:25,536 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 19:18:26,096 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2024-08-12 19:18:33,610 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4600, loss[loss=0.1118, beats_loss=0.009936, ecapa_loss=0.0001691, whisper_loss=0.1002, over 15646.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01096, ecapa_loss=0.0001727, whisper_loss=0.09126, over 3853035.31 frames. ], batch size: 59, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:18:35,785 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.109e-01 2024-08-12 19:18:52,185 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=15.0 2024-08-12 19:18:54,546 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 19:18:56,353 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=12.0 2024-08-12 19:18:57,183 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 19:18:57,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.83 vs. limit=15.0 2024-08-12 19:19:00,782 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 19:19:04,188 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.452e+01 2.765e+01 3.164e+01 4.953e+01, threshold=5.531e+01, percent-clipped=0.0 2024-08-12 19:19:24,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1785380.0, ans=0.95 2024-08-12 19:19:35,107 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 19:19:36,901 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 19:19:37,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1785480.0, ans=0.125 2024-08-12 19:19:47,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1785480.0, ans=0.1 2024-08-12 19:19:52,086 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4650, loss[loss=0.1134, beats_loss=0.01171, ecapa_loss=0.0001617, whisper_loss=0.1001, over 16138.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01094, ecapa_loss=0.0001734, whisper_loss=0.09145, over 3848161.90 frames. ], batch size: 63, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:19:52,279 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 19:19:54,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1785580.0, ans=0.125 2024-08-12 19:20:01,570 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.71 vs. limit=22.5 2024-08-12 19:20:08,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1785680.0, ans=0.125 2024-08-12 19:20:14,114 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 19:20:17,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1785680.0, ans=0.1 2024-08-12 19:20:27,068 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 19:20:31,349 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-12 19:20:31,918 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=15.0 2024-08-12 19:20:34,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1785780.0, ans=0.125 2024-08-12 19:20:34,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1785780.0, ans=0.125 2024-08-12 19:20:54,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1785880.0, ans=0.125 2024-08-12 19:20:57,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1785980.0, ans=0.125 2024-08-12 19:21:07,427 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.471e+00 2024-08-12 19:21:12,682 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4700, loss[loss=0.06843, beats_loss=0.01365, ecapa_loss=0.0001556, whisper_loss=0.05323, over 17955.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01103, ecapa_loss=0.0001715, whisper_loss=0.09158, over 3884730.26 frames. ], batch size: 75, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:21:18,446 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 19:21:43,316 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.537e+01 2.789e+01 3.116e+01 4.712e+01, threshold=5.578e+01, percent-clipped=0.0 2024-08-12 19:21:45,952 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 19:22:03,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1786380.0, ans=0.2 2024-08-12 19:22:04,733 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-12 19:22:10,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1786380.0, ans=0.0 2024-08-12 19:22:14,090 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 19:22:26,326 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.06 vs. limit=15.0 2024-08-12 19:22:32,890 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4750, loss[loss=0.1087, beats_loss=0.01252, ecapa_loss=0.0001933, whisper_loss=0.0942, over 14997.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01105, ecapa_loss=0.0001713, whisper_loss=0.09177, over 3927303.44 frames. ], batch size: 63, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:22:55,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=1786680.0, ans=0.2 2024-08-12 19:23:10,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1786780.0, ans=0.0 2024-08-12 19:23:19,569 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 19:23:27,233 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 19:23:50,878 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4800, loss[loss=0.1234, beats_loss=0.009937, ecapa_loss=0.0001608, whisper_loss=0.1118, over 17782.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01106, ecapa_loss=0.0001728, whisper_loss=0.09171, over 3924529.03 frames. ], batch size: 67, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:23:53,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1787080.0, ans=0.1 2024-08-12 19:23:57,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1787080.0, ans=0.0 2024-08-12 19:23:59,159 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 19:24:20,412 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.537e+01 2.789e+01 3.212e+01 6.421e+01, threshold=5.577e+01, percent-clipped=1.0 2024-08-12 19:24:20,678 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 19:24:22,543 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-12 19:24:26,207 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=15.0 2024-08-12 19:24:33,497 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 19:24:39,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1787380.0, ans=0.125 2024-08-12 19:24:54,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1787480.0, ans=0.125 2024-08-12 19:25:01,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1787480.0, ans=0.125 2024-08-12 19:25:03,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1787480.0, ans=0.2 2024-08-12 19:25:09,301 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-12 19:25:10,361 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4850, loss[loss=0.09096, beats_loss=0.01221, ecapa_loss=0.0001621, whisper_loss=0.07713, over 22703.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01111, ecapa_loss=0.0001721, whisper_loss=0.09139, over 3935889.62 frames. ], batch size: 93, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:25:10,819 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 19:25:22,072 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 19:25:33,515 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 27 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-12 19:25:34,957 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-12 19:25:46,719 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2024-08-12 19:26:34,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1788080.0, ans=0.125 2024-08-12 19:26:35,121 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4900, loss[loss=0.1281, beats_loss=0.0129, ecapa_loss=0.00017, whisper_loss=0.1135, over 22593.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01097, ecapa_loss=0.0001728, whisper_loss=0.09189, over 3916865.75 frames. ], batch size: 89, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:26:40,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1788080.0, ans=0.0 2024-08-12 19:26:41,731 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-12 19:27:06,396 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.493e+01 2.714e+01 3.066e+01 4.979e+01, threshold=5.428e+01, percent-clipped=0.0 2024-08-12 19:27:21,137 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 19:27:36,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1788380.0, ans=0.0 2024-08-12 19:27:43,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1788480.0, ans=0.125 2024-08-12 19:27:56,323 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 4950, loss[loss=0.1152, beats_loss=0.01037, ecapa_loss=0.0001554, whisper_loss=0.1032, over 24072.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01091, ecapa_loss=0.0001738, whisper_loss=0.09239, over 3892632.04 frames. ], batch size: 94, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:28:09,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1788580.0, ans=0.1 2024-08-12 19:28:18,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1788680.0, ans=0.125 2024-08-12 19:28:41,099 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 19:28:52,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1788880.0, ans=0.0 2024-08-12 19:28:55,699 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2024-08-12 19:28:58,391 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 19:29:15,522 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5000, loss[loss=0.1184, beats_loss=0.01187, ecapa_loss=0.0001822, whisper_loss=0.1047, over 22712.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01096, ecapa_loss=0.0001744, whisper_loss=0.09187, over 3896437.53 frames. ], batch size: 93, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:29:18,466 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=8.139e-01 2024-08-12 19:29:43,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1789180.0, ans=0.0 2024-08-12 19:29:47,987 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.488e+01 2.839e+01 3.204e+01 5.431e+01, threshold=5.678e+01, percent-clipped=1.0 2024-08-12 19:30:15,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1789380.0, ans=0.09899494936611666 2024-08-12 19:30:20,168 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.02 vs. limit=15.0 2024-08-12 19:30:24,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1789480.0, ans=0.1 2024-08-12 19:30:33,658 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 19:30:38,174 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5050, loss[loss=0.08752, beats_loss=0.01538, ecapa_loss=0.000151, whisper_loss=0.07063, over 21806.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01094, ecapa_loss=0.0001752, whisper_loss=0.09247, over 3903538.23 frames. ], batch size: 91, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:30:45,478 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2024-08-12 19:30:55,649 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 19:31:02,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1789680.0, ans=0.0 2024-08-12 19:31:04,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1789680.0, ans=0.0 2024-08-12 19:31:05,130 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 31 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 19:31:07,609 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=12.0 2024-08-12 19:31:11,365 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 19:31:12,247 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.05 vs. limit=15.0 2024-08-12 19:31:16,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1789780.0, ans=0.1 2024-08-12 19:31:17,094 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-12 19:31:18,388 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 19:31:21,794 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 19:31:26,187 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-12 19:31:30,834 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 19:31:36,802 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 28 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 19:31:43,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1789980.0, ans=0.125 2024-08-12 19:31:45,581 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 19:31:54,009 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5100, loss[loss=0.1058, beats_loss=0.01214, ecapa_loss=0.0001436, whisper_loss=0.09224, over 23120.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01093, ecapa_loss=0.0001745, whisper_loss=0.09275, over 3911853.43 frames. ], batch size: 89, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:31:55,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1790080.0, ans=0.0 2024-08-12 19:31:58,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1790080.0, ans=0.125 2024-08-12 19:32:10,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1790180.0, ans=0.125 2024-08-12 19:32:14,906 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 19:32:19,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1790180.0, ans=0.2 2024-08-12 19:32:20,083 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.471e+01 2.779e+01 3.135e+01 9.153e+01, threshold=5.559e+01, percent-clipped=1.0 2024-08-12 19:32:32,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1790280.0, ans=0.0 2024-08-12 19:32:33,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1790380.0, ans=0.0 2024-08-12 19:32:36,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.11 vs. limit=15.0 2024-08-12 19:32:39,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1790380.0, ans=0.1 2024-08-12 19:32:59,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1790480.0, ans=0.0 2024-08-12 19:33:02,998 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5150, loss[loss=0.09981, beats_loss=0.01203, ecapa_loss=0.0001425, whisper_loss=0.08636, over 18515.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01101, ecapa_loss=0.000173, whisper_loss=0.09209, over 3898058.57 frames. ], batch size: 73, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:33:27,197 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 19:33:35,104 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 19:34:03,808 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-12 19:34:04,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1790980.0, ans=0.125 2024-08-12 19:34:09,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1791080.0, ans=0.125 2024-08-12 19:34:10,399 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5200, loss[loss=0.1289, beats_loss=0.009431, ecapa_loss=0.0002324, whisper_loss=0.1172, over 13930.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01094, ecapa_loss=0.0001736, whisper_loss=0.09222, over 3890416.58 frames. ], batch size: 57, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:34:11,155 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2024-08-12 19:34:13,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1791080.0, ans=0.1 2024-08-12 19:34:21,202 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 19:34:36,531 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.499e+01 2.713e+01 3.001e+01 1.517e+02, threshold=5.426e+01, percent-clipped=1.0 2024-08-12 19:34:56,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1791380.0, ans=0.125 2024-08-12 19:35:00,294 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 19:35:18,651 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 19:35:19,370 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5250, loss[loss=0.1209, beats_loss=0.01063, ecapa_loss=0.0001631, whisper_loss=0.1087, over 22987.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01096, ecapa_loss=0.0001745, whisper_loss=0.09157, over 3879064.88 frames. ], batch size: 91, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:35:20,320 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.77 vs. limit=15.0 2024-08-12 19:35:21,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1791580.0, ans=0.125 2024-08-12 19:35:23,204 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 19:35:40,725 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2024-08-12 19:35:46,266 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 19:35:48,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1791780.0, ans=0.0 2024-08-12 19:36:04,978 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 19:36:05,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1791880.0, ans=0.125 2024-08-12 19:36:05,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1791880.0, ans=0.125 2024-08-12 19:36:28,603 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5300, loss[loss=0.09983, beats_loss=0.01123, ecapa_loss=0.0001552, whisper_loss=0.08705, over 17897.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01094, ecapa_loss=0.0001745, whisper_loss=0.09136, over 3889075.73 frames. ], batch size: 68, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:36:31,258 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-12 19:36:35,406 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 19:36:54,230 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.416e+01 2.797e+01 3.236e+01 7.041e+01, threshold=5.594e+01, percent-clipped=1.0 2024-08-12 19:36:57,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1792280.0, ans=0.2 2024-08-12 19:36:57,722 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.58 vs. limit=6.0 2024-08-12 19:37:01,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1792280.0, ans=0.5 2024-08-12 19:37:02,245 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 19:37:20,375 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=12.0 2024-08-12 19:37:31,896 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 19:37:35,836 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5350, loss[loss=0.09716, beats_loss=0.01134, ecapa_loss=0.0001069, whisper_loss=0.08475, over 16226.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01094, ecapa_loss=0.000173, whisper_loss=0.0911, over 3887904.90 frames. ], batch size: 58, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:37:36,653 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.62 vs. limit=15.0 2024-08-12 19:37:45,452 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 19:37:49,395 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 19:37:50,693 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 19:37:57,036 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2024-08-12 19:37:58,064 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-12 19:37:58,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1792680.0, ans=0.125 2024-08-12 19:38:00,713 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 19:38:11,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1792780.0, ans=0.125 2024-08-12 19:38:15,713 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 16 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-12 19:38:36,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1792980.0, ans=0.125 2024-08-12 19:38:44,071 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5400, loss[loss=0.09917, beats_loss=0.0115, ecapa_loss=0.0001387, whisper_loss=0.08629, over 16519.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01095, ecapa_loss=0.0001738, whisper_loss=0.0909, over 3859157.62 frames. ], batch size: 64, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:38:44,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1793080.0, ans=0.125 2024-08-12 19:39:08,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1793180.0, ans=0.2 2024-08-12 19:39:09,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.19 vs. limit=15.0 2024-08-12 19:39:09,889 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.477e+01 2.760e+01 3.199e+01 8.149e+01, threshold=5.520e+01, percent-clipped=2.0 2024-08-12 19:39:11,320 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 19:39:11,803 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2024-08-12 19:39:26,220 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 17 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 19:39:36,357 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.36 vs. limit=15.0 2024-08-12 19:39:51,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1793480.0, ans=0.0 2024-08-12 19:39:53,581 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5450, loss[loss=0.1014, beats_loss=0.01325, ecapa_loss=0.0001219, whisper_loss=0.08688, over 18002.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01092, ecapa_loss=0.0001734, whisper_loss=0.09148, over 3872494.88 frames. ], batch size: 72, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:40:01,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1793580.0, ans=0.125 2024-08-12 19:40:08,308 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 19:40:22,176 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 36 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 19:40:31,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1793780.0, ans=0.0 2024-08-12 19:41:01,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=1793980.0, ans=22.5 2024-08-12 19:41:05,217 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 19:41:06,636 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5500, loss[loss=0.1067, beats_loss=0.01302, ecapa_loss=0.0001669, whisper_loss=0.09204, over 21471.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01092, ecapa_loss=0.0001735, whisper_loss=0.09148, over 3885004.95 frames. ], batch size: 86, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:41:24,581 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 19:41:32,266 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.427e+01 2.827e+01 3.059e+01 4.853e+01, threshold=5.654e+01, percent-clipped=0.0 2024-08-12 19:41:41,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1794280.0, ans=0.0 2024-08-12 19:41:49,484 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-12 19:41:56,238 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 13 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 19:42:11,876 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 28 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-12 19:42:13,255 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 19:42:17,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1794480.0, ans=0.04949747468305833 2024-08-12 19:42:24,257 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5550, loss[loss=0.09736, beats_loss=0.00746, ecapa_loss=0.0002383, whisper_loss=0.08752, over 16705.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01095, ecapa_loss=0.0001749, whisper_loss=0.09128, over 3889531.84 frames. ], batch size: 69, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:42:37,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1794580.0, ans=0.0 2024-08-12 19:42:42,118 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-12 19:42:47,934 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 19:43:03,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1794780.0, ans=0.2 2024-08-12 19:43:11,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1794780.0, ans=0.125 2024-08-12 19:43:15,031 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 19:43:44,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1794980.0, ans=0.125 2024-08-12 19:43:45,976 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 18 from LS+wenet, 28 from Vox, 48 fro AS 2024-08-12 19:43:47,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1794980.0, ans=0.125 2024-08-12 19:43:49,801 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5600, loss[loss=0.09502, beats_loss=0.01136, ecapa_loss=0.0001422, whisper_loss=0.08224, over 15426.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01103, ecapa_loss=0.0001735, whisper_loss=0.09057, over 3887526.87 frames. ], batch size: 59, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:43:52,271 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 20 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-12 19:43:57,966 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.21 vs. limit=15.0 2024-08-12 19:44:08,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1795180.0, ans=0.125 2024-08-12 19:44:24,203 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.502e+01 2.768e+01 3.142e+01 4.658e+01, threshold=5.536e+01, percent-clipped=0.0 2024-08-12 19:44:38,799 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.98 vs. limit=10.0 2024-08-12 19:45:00,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1795380.0, ans=0.0 2024-08-12 19:45:01,526 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 19:45:13,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1795480.0, ans=0.1 2024-08-12 19:45:21,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1795580.0, ans=0.04949747468305833 2024-08-12 19:45:22,780 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5650, loss[loss=0.1208, beats_loss=0.01123, ecapa_loss=0.0001357, whisper_loss=0.1082, over 24000.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01102, ecapa_loss=0.0001729, whisper_loss=0.09131, over 3878518.82 frames. ], batch size: 92, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:45:43,105 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.35 vs. limit=15.0 2024-08-12 19:45:43,575 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-12 19:45:55,710 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-12 19:46:13,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1795780.0, ans=0.1 2024-08-12 19:46:17,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1795880.0, ans=0.125 2024-08-12 19:46:21,099 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.91 vs. limit=15.0 2024-08-12 19:46:44,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1795980.0, ans=0.1 2024-08-12 19:46:48,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1795980.0, ans=0.1 2024-08-12 19:46:52,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=1795980.0, ans=12.0 2024-08-12 19:46:57,003 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5700, loss[loss=0.09783, beats_loss=0.009385, ecapa_loss=0.0002252, whisper_loss=0.08619, over 14060.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01105, ecapa_loss=0.0001729, whisper_loss=0.09121, over 3901390.08 frames. ], batch size: 58, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:46:57,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1796080.0, ans=0.125 2024-08-12 19:47:06,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1796080.0, ans=0.125 2024-08-12 19:47:26,158 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.60 vs. limit=10.0 2024-08-12 19:47:27,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1796180.0, ans=0.0 2024-08-12 19:47:33,426 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.533e+01 2.876e+01 3.216e+01 4.377e+01, threshold=5.753e+01, percent-clipped=0.0 2024-08-12 19:47:48,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1796280.0, ans=0.0 2024-08-12 19:48:30,905 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5750, loss[loss=0.1021, beats_loss=0.01173, ecapa_loss=0.0001635, whisper_loss=0.0887, over 21564.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01116, ecapa_loss=0.0001723, whisper_loss=0.09026, over 3858012.02 frames. ], batch size: 88, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:48:32,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1796580.0, ans=0.05 2024-08-12 19:48:33,649 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 19:48:38,939 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 19:48:58,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1796680.0, ans=0.125 2024-08-12 19:48:59,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1796680.0, ans=0.0 2024-08-12 19:49:11,226 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 40 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 19:49:18,703 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 19:49:24,958 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 19:49:33,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1796880.0, ans=0.0 2024-08-12 19:49:46,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1796880.0, ans=0.0 2024-08-12 19:49:48,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=1796980.0, ans=0.02 2024-08-12 19:50:00,491 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 19:50:01,506 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5800, loss[loss=0.0978, beats_loss=0.01267, ecapa_loss=0.0001422, whisper_loss=0.08371, over 22627.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01106, ecapa_loss=0.0001738, whisper_loss=0.09056, over 3826591.00 frames. ], batch size: 87, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:50:03,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1797080.0, ans=0.2 2024-08-12 19:50:07,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1797080.0, ans=0.0 2024-08-12 19:50:12,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1797080.0, ans=0.125 2024-08-12 19:50:27,799 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.021e+01 2.440e+01 2.724e+01 3.167e+01 6.575e+01, threshold=5.447e+01, percent-clipped=2.0 2024-08-12 19:50:51,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1797380.0, ans=0.125 2024-08-12 19:50:58,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1797480.0, ans=0.0 2024-08-12 19:50:58,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1797480.0, ans=0.125 2024-08-12 19:51:11,545 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-12 19:51:12,133 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2024-08-12 19:51:14,167 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5850, loss[loss=0.1039, beats_loss=0.0109, ecapa_loss=0.0002083, whisper_loss=0.09088, over 22101.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01103, ecapa_loss=0.0001744, whisper_loss=0.09115, over 3857619.54 frames. ], batch size: 94, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:51:31,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1797680.0, ans=0.125 2024-08-12 19:51:35,026 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 19:51:35,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1797680.0, ans=0.2 2024-08-12 19:51:37,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1797680.0, ans=0.125 2024-08-12 19:51:55,653 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2024-08-12 19:52:01,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1797880.0, ans=0.125 2024-08-12 19:52:02,325 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 19:52:19,107 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-12 19:52:26,320 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5900, loss[loss=0.1002, beats_loss=0.0112, ecapa_loss=0.0002028, whisper_loss=0.08696, over 22026.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01104, ecapa_loss=0.0001731, whisper_loss=0.09117, over 3856087.43 frames. ], batch size: 93, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:52:33,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1798080.0, ans=0.1 2024-08-12 19:52:37,057 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-08-12 19:52:51,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1798180.0, ans=0.125 2024-08-12 19:52:54,636 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.654e+01 2.967e+01 3.336e+01 4.788e+01, threshold=5.934e+01, percent-clipped=0.0 2024-08-12 19:52:59,345 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 19:53:11,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1798380.0, ans=0.125 2024-08-12 19:53:38,603 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 5950, loss[loss=0.1159, beats_loss=0.0108, ecapa_loss=0.0001806, whisper_loss=0.1033, over 19745.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01108, ecapa_loss=0.0001736, whisper_loss=0.09053, over 3883889.64 frames. ], batch size: 78, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:53:45,513 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-12 19:53:51,380 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 19:54:06,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1798680.0, ans=0.0 2024-08-12 19:54:12,823 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 19:54:14,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1798780.0, ans=0.0 2024-08-12 19:54:22,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1798780.0, ans=0.0 2024-08-12 19:54:24,791 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 19:54:29,714 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 19:54:52,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1798980.0, ans=0.0 2024-08-12 19:54:53,910 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 19:54:55,079 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6000, loss[loss=0.1111, beats_loss=0.01202, ecapa_loss=0.0001683, whisper_loss=0.09745, over 23654.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01099, ecapa_loss=0.0001732, whisper_loss=0.09202, over 3927966.92 frames. ], batch size: 94, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:54:55,080 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 19:55:33,552 INFO [train_multi_KD3.py:1149] (2/4) Epoch 13, validation on ASR_libri: loss=0.2545, beats_loss=0, ecapa_loss=0.0005899, whisper_loss=0.2486, over 922467.00 frames. 2024-08-12 19:55:47,950 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.7155, 4.5297, 3.6611, 4.0953], device='cuda:2') 2024-08-12 19:55:50,109 INFO [train_multi_KD3.py:1149] (2/4) Epoch 13, validation on SV_voxceleb1: loss=0.004696, beats_loss=0, ecapa_loss=0.0004696, whisper_loss=0, over 939242.00 frames. 2024-08-12 19:57:46,570 INFO [train_multi_KD3.py:1149] (2/4) Epoch 13, validation on AT_audioset: loss=0.02428, beats_loss=0.02428, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 19:57:46,574 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 19:57:52,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1799080.0, ans=0.0 2024-08-12 19:57:53,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1799080.0, ans=0.0 2024-08-12 19:57:57,729 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 19:58:01,201 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2024-08-12 19:58:05,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1799180.0, ans=0.2 2024-08-12 19:58:16,060 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.501e+01 2.791e+01 3.141e+01 5.827e+01, threshold=5.581e+01, percent-clipped=0.0 2024-08-12 19:58:38,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1799380.0, ans=0.0 2024-08-12 19:58:59,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1799480.0, ans=0.125 2024-08-12 19:59:03,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1799580.0, ans=0.1 2024-08-12 19:59:04,322 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6050, loss[loss=0.09222, beats_loss=0.01314, ecapa_loss=0.0001389, whisper_loss=0.07769, over 22412.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01097, ecapa_loss=0.000173, whisper_loss=0.09182, over 3911892.04 frames. ], batch size: 89, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:59:10,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1799580.0, ans=0.07 2024-08-12 19:59:23,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1799680.0, ans=0.125 2024-08-12 19:59:24,577 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 19:59:36,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1799780.0, ans=0.2 2024-08-12 19:59:36,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=1799780.0, ans=0.1 2024-08-12 19:59:43,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1799780.0, ans=0.1 2024-08-12 19:59:47,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1799780.0, ans=0.125 2024-08-12 19:59:50,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2024-08-12 19:59:52,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1799880.0, ans=0.125 2024-08-12 20:00:04,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1799880.0, ans=0.0 2024-08-12 20:00:04,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1799880.0, ans=0.125 2024-08-12 20:00:07,284 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2024-08-12 20:00:15,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1799980.0, ans=0.125 2024-08-12 20:00:24,641 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6100, loss[loss=0.1072, beats_loss=0.0111, ecapa_loss=0.0001686, whisper_loss=0.09444, over 18275.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01095, ecapa_loss=0.0001734, whisper_loss=0.09211, over 3921166.49 frames. ], batch size: 71, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:00:55,121 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.407e+01 2.685e+01 3.141e+01 4.380e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-12 20:00:56,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1800280.0, ans=0.0 2024-08-12 20:01:00,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1800280.0, ans=0.0 2024-08-12 20:01:11,650 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-12 20:01:32,936 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:01:35,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1800480.0, ans=0.125 2024-08-12 20:01:39,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1800480.0, ans=0.125 2024-08-12 20:01:42,020 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6150, loss[loss=0.1096, beats_loss=0.01006, ecapa_loss=0.0001861, whisper_loss=0.09765, over 16746.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01096, ecapa_loss=0.0001737, whisper_loss=0.09217, over 3911886.07 frames. ], batch size: 67, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:01:44,992 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 20:01:50,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1800580.0, ans=0.125 2024-08-12 20:01:51,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1800580.0, ans=0.0 2024-08-12 20:02:00,164 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-08-12 20:02:03,525 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.92 vs. limit=15.0 2024-08-12 20:02:07,212 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 20:02:10,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1800780.0, ans=0.0 2024-08-12 20:02:36,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1800880.0, ans=0.125 2024-08-12 20:02:39,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1800880.0, ans=0.125 2024-08-12 20:02:52,007 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 20:02:58,079 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6200, loss[loss=0.1076, beats_loss=0.009224, ecapa_loss=0.0001637, whisper_loss=0.09674, over 14551.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0109, ecapa_loss=0.0001736, whisper_loss=0.09198, over 3887434.35 frames. ], batch size: 54, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:03:00,346 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 20:03:08,336 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 20:03:08,983 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2024-08-12 20:03:25,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1801180.0, ans=0.0 2024-08-12 20:03:26,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1801280.0, ans=0.1 2024-08-12 20:03:27,801 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.462e+01 2.878e+01 3.273e+01 2.094e+02, threshold=5.757e+01, percent-clipped=3.0 2024-08-12 20:03:42,867 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=15.0 2024-08-12 20:03:59,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1801480.0, ans=0.1 2024-08-12 20:04:02,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1801480.0, ans=0.125 2024-08-12 20:04:13,896 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6250, loss[loss=0.08893, beats_loss=0.01319, ecapa_loss=0.0001876, whisper_loss=0.07386, over 20469.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01086, ecapa_loss=0.0001738, whisper_loss=0.09206, over 3885679.78 frames. ], batch size: 86, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:04:43,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1801780.0, ans=0.1 2024-08-12 20:04:43,388 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2024-08-12 20:04:51,247 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 20:05:00,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1801880.0, ans=0.5 2024-08-12 20:05:20,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1801980.0, ans=0.125 2024-08-12 20:05:28,342 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6300, loss[loss=0.1138, beats_loss=0.01144, ecapa_loss=0.0001578, whisper_loss=0.1008, over 23721.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01086, ecapa_loss=0.0001736, whisper_loss=0.09226, over 3901165.28 frames. ], batch size: 92, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:05:39,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1802080.0, ans=0.125 2024-08-12 20:05:45,103 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 20:05:47,393 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=15.0 2024-08-12 20:05:50,625 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 20:05:54,169 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 24 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-12 20:05:57,989 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.436e+01 2.696e+01 3.138e+01 5.310e+01, threshold=5.392e+01, percent-clipped=0.0 2024-08-12 20:05:59,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1802280.0, ans=0.125 2024-08-12 20:06:43,459 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6350, loss[loss=0.1277, beats_loss=0.01061, ecapa_loss=0.0001515, whisper_loss=0.1156, over 24214.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0109, ecapa_loss=0.0001729, whisper_loss=0.09194, over 3918118.20 frames. ], batch size: 92, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:06:50,146 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.97 vs. limit=10.0 2024-08-12 20:06:56,661 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 20:07:04,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1802680.0, ans=0.09899494936611666 2024-08-12 20:07:30,244 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 20:07:47,795 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2024-08-12 20:07:56,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1803080.0, ans=0.125 2024-08-12 20:07:56,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1803080.0, ans=0.2 2024-08-12 20:07:57,087 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6400, loss[loss=0.1082, beats_loss=0.01064, ecapa_loss=0.0001993, whisper_loss=0.09554, over 19607.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0108, ecapa_loss=0.0001744, whisper_loss=0.09239, over 3907903.84 frames. ], batch size: 82, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:07:59,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1803080.0, ans=0.125 2024-08-12 20:08:16,478 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 20:08:16,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1803180.0, ans=0.04949747468305833 2024-08-12 20:08:18,372 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:08:24,273 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.26 vs. limit=22.5 2024-08-12 20:08:24,773 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.558e+01 2.846e+01 3.413e+01 1.173e+02, threshold=5.692e+01, percent-clipped=2.0 2024-08-12 20:08:28,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1803280.0, ans=0.125 2024-08-12 20:08:41,183 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 20:08:43,641 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 22 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 20:08:45,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1803380.0, ans=0.125 2024-08-12 20:08:51,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1803380.0, ans=0.1 2024-08-12 20:09:08,422 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6450, loss[loss=0.1262, beats_loss=0.009311, ecapa_loss=0.0001802, whisper_loss=0.1151, over 20539.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01087, ecapa_loss=0.0001743, whisper_loss=0.09201, over 3918555.49 frames. ], batch size: 79, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:09:16,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1803580.0, ans=0.125 2024-08-12 20:09:18,069 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 20:09:25,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1803680.0, ans=0.125 2024-08-12 20:09:28,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1803680.0, ans=0.07 2024-08-12 20:09:36,979 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 20:09:52,538 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.749e-01 2024-08-12 20:09:58,003 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.01 vs. limit=6.0 2024-08-12 20:10:02,397 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 19 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 20:10:09,679 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 20:10:15,152 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-12 20:10:16,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1803980.0, ans=0.125 2024-08-12 20:10:18,970 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 20:10:20,051 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6500, loss[loss=0.09728, beats_loss=0.01097, ecapa_loss=0.000131, whisper_loss=0.085, over 19038.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01096, ecapa_loss=0.0001736, whisper_loss=0.09173, over 3899203.80 frames. ], batch size: 71, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:10:44,676 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 20:10:48,800 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.417e+01 2.617e+01 2.819e+01 4.970e+01, threshold=5.233e+01, percent-clipped=0.0 2024-08-12 20:10:59,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1804280.0, ans=0.125 2024-08-12 20:11:30,767 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6550, loss[loss=0.1011, beats_loss=0.01143, ecapa_loss=0.0001571, whisper_loss=0.0881, over 23233.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01103, ecapa_loss=0.0001729, whisper_loss=0.09156, over 3908816.44 frames. ], batch size: 92, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:11:38,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1804580.0, ans=0.125 2024-08-12 20:11:38,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1804580.0, ans=0.2 2024-08-12 20:11:51,517 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-12 20:12:09,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1804780.0, ans=0.0 2024-08-12 20:12:17,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1804880.0, ans=0.125 2024-08-12 20:12:20,678 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 34 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 20:12:34,301 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 15 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 20:12:35,601 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 20:12:39,698 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6600, loss[loss=0.1144, beats_loss=0.01109, ecapa_loss=0.0001505, whisper_loss=0.1018, over 19948.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01098, ecapa_loss=0.0001736, whisper_loss=0.09224, over 3938893.01 frames. ], batch size: 80, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:12:46,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1805080.0, ans=0.125 2024-08-12 20:12:50,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1805080.0, ans=0.125 2024-08-12 20:12:55,426 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.12 vs. limit=22.5 2024-08-12 20:12:57,971 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-12 20:13:03,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1805180.0, ans=0.2 2024-08-12 20:13:06,740 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.480e+01 2.766e+01 3.110e+01 5.063e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 20:13:07,028 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 20:13:22,198 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-12 20:13:34,164 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 20:13:41,806 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.89 vs. limit=22.5 2024-08-12 20:13:44,649 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.86 vs. limit=15.0 2024-08-12 20:13:45,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1805480.0, ans=0.125 2024-08-12 20:13:47,713 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6650, loss[loss=0.1182, beats_loss=0.009415, ecapa_loss=0.0001665, whisper_loss=0.1071, over 19657.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01103, ecapa_loss=0.0001726, whisper_loss=0.09201, over 3965686.11 frames. ], batch size: 78, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:14:33,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1805880.0, ans=0.125 2024-08-12 20:14:48,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1805980.0, ans=0.0 2024-08-12 20:14:51,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1805980.0, ans=0.0 2024-08-12 20:14:56,301 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6700, loss[loss=0.1177, beats_loss=0.01028, ecapa_loss=0.0001339, whisper_loss=0.1061, over 17803.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01087, ecapa_loss=0.0001729, whisper_loss=0.09287, over 3928819.23 frames. ], batch size: 63, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:15:02,603 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2024-08-12 20:15:18,274 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-12 20:15:19,572 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 20:15:23,562 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.568e+01 2.820e+01 3.306e+01 6.884e+01, threshold=5.641e+01, percent-clipped=3.0 2024-08-12 20:15:41,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1806380.0, ans=0.5 2024-08-12 20:15:44,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1806380.0, ans=0.125 2024-08-12 20:15:45,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1806380.0, ans=0.125 2024-08-12 20:15:56,160 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-12 20:16:05,714 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6750, loss[loss=0.112, beats_loss=0.008891, ecapa_loss=0.0001852, whisper_loss=0.1013, over 22637.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01081, ecapa_loss=0.0001743, whisper_loss=0.0924, over 3880718.92 frames. ], batch size: 91, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:16:14,214 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-12 20:16:15,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1806580.0, ans=0.125 2024-08-12 20:16:18,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1806680.0, ans=0.125 2024-08-12 20:16:20,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1806680.0, ans=0.0 2024-08-12 20:16:26,116 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.23 vs. limit=10.0 2024-08-12 20:16:27,042 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 20:16:30,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1806680.0, ans=0.1 2024-08-12 20:16:49,595 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 20:16:51,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1806880.0, ans=0.0 2024-08-12 20:16:53,707 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 20:16:53,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1806880.0, ans=0.125 2024-08-12 20:16:54,212 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.43 vs. limit=22.5 2024-08-12 20:17:04,204 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.42 vs. limit=15.0 2024-08-12 20:17:05,429 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2024-08-12 20:17:06,135 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 24 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 20:17:15,494 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6800, loss[loss=0.1088, beats_loss=0.01101, ecapa_loss=0.0001762, whisper_loss=0.09606, over 17578.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01087, ecapa_loss=0.0001741, whisper_loss=0.09252, over 3875912.07 frames. ], batch size: 70, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:17:17,007 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 20:17:34,483 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-12 20:17:43,022 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.433e+01 2.678e+01 3.224e+01 5.136e+01, threshold=5.356e+01, percent-clipped=0.0 2024-08-12 20:17:43,986 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.91 vs. limit=22.5 2024-08-12 20:18:05,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1807380.0, ans=0.125 2024-08-12 20:18:15,789 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2024-08-12 20:18:24,713 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6850, loss[loss=0.1206, beats_loss=0.008753, ecapa_loss=0.0002086, whisper_loss=0.1098, over 17895.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01093, ecapa_loss=0.0001736, whisper_loss=0.09196, over 3876335.01 frames. ], batch size: 72, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:18:24,907 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 39 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-12 20:18:54,437 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 20:19:11,495 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.43 vs. limit=10.0 2024-08-12 20:19:24,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1807980.0, ans=0.125 2024-08-12 20:19:25,072 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.59 vs. limit=10.0 2024-08-12 20:19:27,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1807980.0, ans=0.125 2024-08-12 20:19:33,779 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6900, loss[loss=0.08605, beats_loss=0.01287, ecapa_loss=0.0001676, whisper_loss=0.07151, over 22303.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01101, ecapa_loss=0.000173, whisper_loss=0.09122, over 3890026.21 frames. ], batch size: 96, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:19:39,877 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-12 20:19:41,318 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-12 20:19:48,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1808180.0, ans=0.0 2024-08-12 20:19:52,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1808180.0, ans=0.125 2024-08-12 20:20:01,724 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.057e+01 2.402e+01 2.709e+01 3.139e+01 1.091e+02, threshold=5.419e+01, percent-clipped=1.0 2024-08-12 20:20:10,654 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.83 vs. limit=15.0 2024-08-12 20:20:21,083 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2024-08-12 20:20:24,675 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 20:20:26,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1808380.0, ans=0.125 2024-08-12 20:20:27,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1808480.0, ans=0.125 2024-08-12 20:20:33,850 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 40 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 20:20:41,725 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 6950, loss[loss=0.08674, beats_loss=0.01412, ecapa_loss=0.0001573, whisper_loss=0.07104, over 18584.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01106, ecapa_loss=0.0001705, whisper_loss=0.09153, over 3882425.10 frames. ], batch size: 75, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:21:02,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1808680.0, ans=0.07 2024-08-12 20:21:03,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1808680.0, ans=0.1 2024-08-12 20:21:08,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1808680.0, ans=0.125 2024-08-12 20:21:19,832 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 20:21:22,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1808880.0, ans=0.125 2024-08-12 20:21:32,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1808880.0, ans=0.0 2024-08-12 20:21:35,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1808880.0, ans=0.125 2024-08-12 20:21:52,144 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7000, loss[loss=0.114, beats_loss=0.01086, ecapa_loss=0.0001656, whisper_loss=0.1014, over 18428.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0111, ecapa_loss=0.0001718, whisper_loss=0.09148, over 3901841.74 frames. ], batch size: 74, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:21:56,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1809080.0, ans=0.125 2024-08-12 20:22:00,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1809080.0, ans=0.1 2024-08-12 20:22:19,778 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.985e+01 2.381e+01 2.667e+01 3.091e+01 4.298e+01, threshold=5.335e+01, percent-clipped=0.0 2024-08-12 20:22:24,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1809280.0, ans=0.0 2024-08-12 20:22:24,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1809280.0, ans=0.0 2024-08-12 20:22:57,553 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 21 from LS+wenet, 27 from Vox, 46 fro AS 2024-08-12 20:22:57,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1809480.0, ans=0.2 2024-08-12 20:23:01,536 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7050, loss[loss=0.1134, beats_loss=0.008833, ecapa_loss=0.0001924, whisper_loss=0.1027, over 21791.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01106, ecapa_loss=0.0001732, whisper_loss=0.09136, over 3888355.76 frames. ], batch size: 90, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:23:03,739 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.00 vs. limit=15.0 2024-08-12 20:23:07,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1809580.0, ans=0.125 2024-08-12 20:23:08,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1809580.0, ans=0.2 2024-08-12 20:23:08,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1809580.0, ans=0.125 2024-08-12 20:23:11,307 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-12 20:23:16,655 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 20:23:31,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1809780.0, ans=0.125 2024-08-12 20:23:49,467 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 20:24:01,338 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 18 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 20:24:02,817 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 20:24:04,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1809980.0, ans=0.0 2024-08-12 20:24:10,822 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7100, loss[loss=0.1029, beats_loss=0.009846, ecapa_loss=0.0002011, whisper_loss=0.09101, over 14465.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01107, ecapa_loss=0.0001717, whisper_loss=0.09158, over 3892499.92 frames. ], batch size: 58, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:24:13,378 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2024-08-12 20:24:25,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1810180.0, ans=0.025 2024-08-12 20:24:36,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1810180.0, ans=0.125 2024-08-12 20:24:38,412 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.554e+01 2.752e+01 3.133e+01 4.741e+01, threshold=5.504e+01, percent-clipped=0.0 2024-08-12 20:24:49,420 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 20:25:11,333 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 20:25:19,342 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7150, loss[loss=0.1052, beats_loss=0.01115, ecapa_loss=0.0001884, whisper_loss=0.0922, over 20210.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01111, ecapa_loss=0.000171, whisper_loss=0.09088, over 3891770.48 frames. ], batch size: 81, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:25:29,577 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.96 vs. limit=15.0 2024-08-12 20:25:35,814 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 20:25:49,562 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 22 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-12 20:25:58,885 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:26:05,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1810880.0, ans=0.1 2024-08-12 20:26:08,241 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 36 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 20:26:28,797 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7200, loss[loss=0.0892, beats_loss=0.01176, ecapa_loss=0.000215, whisper_loss=0.07529, over 20565.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01114, ecapa_loss=0.0001718, whisper_loss=0.0905, over 3886893.59 frames. ], batch size: 90, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:26:31,646 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 20:26:40,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1811080.0, ans=0.125 2024-08-12 20:26:50,142 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.76 vs. limit=22.5 2024-08-12 20:26:55,631 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.470e+01 2.758e+01 3.060e+01 4.587e+01, threshold=5.516e+01, percent-clipped=0.0 2024-08-12 20:26:58,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1811280.0, ans=0.125 2024-08-12 20:26:59,250 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.63 vs. limit=22.5 2024-08-12 20:27:10,114 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.619e+05 2024-08-12 20:27:10,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1811380.0, ans=0.125 2024-08-12 20:27:14,064 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 20:27:16,195 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.22 vs. limit=8.0 2024-08-12 20:27:21,455 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.41 vs. limit=22.5 2024-08-12 20:27:28,953 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-12 20:27:37,036 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7250, loss[loss=0.1047, beats_loss=0.0128, ecapa_loss=0.0001083, whisper_loss=0.09085, over 22855.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01118, ecapa_loss=0.0001707, whisper_loss=0.09041, over 3894188.97 frames. ], batch size: 86, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:27:42,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.41 vs. limit=15.0 2024-08-12 20:27:46,306 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.71 vs. limit=22.5 2024-08-12 20:28:03,080 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 20:28:16,776 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.03 vs. limit=10.0 2024-08-12 20:28:36,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1811980.0, ans=0.125 2024-08-12 20:28:43,369 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-12 20:28:47,396 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7300, loss[loss=0.1005, beats_loss=0.01088, ecapa_loss=0.0001696, whisper_loss=0.08793, over 16938.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01115, ecapa_loss=0.00017, whisper_loss=0.09077, over 3874999.50 frames. ], batch size: 68, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:28:54,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1812080.0, ans=0.125 2024-08-12 20:28:56,567 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.46 vs. limit=15.0 2024-08-12 20:29:02,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1812180.0, ans=0.1 2024-08-12 20:29:14,955 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.458e+01 2.787e+01 3.037e+01 3.790e+01, threshold=5.575e+01, percent-clipped=0.0 2024-08-12 20:29:23,568 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 36 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 20:29:26,398 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-12 20:29:29,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1812380.0, ans=0.125 2024-08-12 20:29:37,235 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 36 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-12 20:29:56,526 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7350, loss[loss=0.1147, beats_loss=0.009883, ecapa_loss=0.0001619, whisper_loss=0.1032, over 21654.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01106, ecapa_loss=0.0001711, whisper_loss=0.09145, over 3889994.19 frames. ], batch size: 85, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:30:17,660 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-12 20:30:36,643 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2024-08-12 20:30:37,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1812880.0, ans=0.0 2024-08-12 20:30:41,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1812880.0, ans=0.0 2024-08-12 20:30:44,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1812880.0, ans=0.125 2024-08-12 20:30:55,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1812980.0, ans=0.125 2024-08-12 20:31:02,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1812980.0, ans=0.0 2024-08-12 20:31:04,883 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7400, loss[loss=0.1096, beats_loss=0.01153, ecapa_loss=0.0001712, whisper_loss=0.09638, over 21575.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01105, ecapa_loss=0.0001709, whisper_loss=0.09143, over 3864179.58 frames. ], batch size: 88, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:31:07,470 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.73 vs. limit=10.0 2024-08-12 20:31:17,054 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2024-08-12 20:31:26,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1813180.0, ans=0.125 2024-08-12 20:31:32,385 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.493e+01 2.726e+01 3.079e+01 4.243e+01, threshold=5.453e+01, percent-clipped=0.0 2024-08-12 20:31:37,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1813280.0, ans=0.0 2024-08-12 20:31:49,272 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-12 20:31:53,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1813380.0, ans=0.125 2024-08-12 20:31:55,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1813380.0, ans=0.125 2024-08-12 20:32:01,365 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 24 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-12 20:32:13,719 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7450, loss[loss=0.09574, beats_loss=0.01144, ecapa_loss=0.0001801, whisper_loss=0.0825, over 16132.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01101, ecapa_loss=0.0001711, whisper_loss=0.09189, over 3861600.43 frames. ], batch size: 65, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:32:22,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1813580.0, ans=0.1 2024-08-12 20:32:31,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1813680.0, ans=0.125 2024-08-12 20:33:08,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1813980.0, ans=0.0 2024-08-12 20:33:14,691 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 13 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 20:33:17,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1813980.0, ans=0.125 2024-08-12 20:33:21,747 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7500, loss[loss=0.1013, beats_loss=0.01073, ecapa_loss=0.0001765, whisper_loss=0.08877, over 22831.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01104, ecapa_loss=0.0001705, whisper_loss=0.09162, over 3879314.23 frames. ], batch size: 90, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:33:23,341 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 20:33:31,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1814080.0, ans=0.1 2024-08-12 20:33:34,347 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-12 20:33:45,428 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 20:33:49,346 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.399e+01 2.676e+01 3.018e+01 5.657e+01, threshold=5.351e+01, percent-clipped=1.0 2024-08-12 20:33:53,511 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 20:33:58,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1814280.0, ans=0.0 2024-08-12 20:34:15,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1814480.0, ans=0.0 2024-08-12 20:34:31,175 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7550, loss[loss=0.1065, beats_loss=0.01135, ecapa_loss=0.000159, whisper_loss=0.09359, over 13656.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01105, ecapa_loss=0.0001716, whisper_loss=0.09106, over 3851541.56 frames. ], batch size: 54, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:34:35,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1814580.0, ans=0.0 2024-08-12 20:34:37,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1814580.0, ans=0.125 2024-08-12 20:34:45,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1814680.0, ans=0.125 2024-08-12 20:35:40,649 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7600, loss[loss=0.1221, beats_loss=0.008063, ecapa_loss=0.0001952, whisper_loss=0.1121, over 21630.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01103, ecapa_loss=0.0001712, whisper_loss=0.09059, over 3830066.82 frames. ], batch size: 84, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:35:54,547 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.07 vs. limit=10.0 2024-08-12 20:35:56,512 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 20:36:00,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1815180.0, ans=0.125 2024-08-12 20:36:03,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1815180.0, ans=0.0 2024-08-12 20:36:08,618 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.568e+01 2.871e+01 3.338e+01 1.735e+02, threshold=5.742e+01, percent-clipped=2.0 2024-08-12 20:36:22,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1815380.0, ans=0.2 2024-08-12 20:36:30,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1815380.0, ans=0.0 2024-08-12 20:36:35,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1815480.0, ans=0.0 2024-08-12 20:36:39,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1815480.0, ans=0.0 2024-08-12 20:36:39,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1815480.0, ans=0.1 2024-08-12 20:36:46,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1815480.0, ans=0.125 2024-08-12 20:36:50,543 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7650, loss[loss=0.1018, beats_loss=0.008242, ecapa_loss=0.0001819, whisper_loss=0.09177, over 18569.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01089, ecapa_loss=0.0001722, whisper_loss=0.09121, over 3845044.12 frames. ], batch size: 74, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:36:52,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1815580.0, ans=0.0 2024-08-12 20:36:56,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1815580.0, ans=0.07 2024-08-12 20:37:10,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1815680.0, ans=0.125 2024-08-12 20:37:14,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1815680.0, ans=0.125 2024-08-12 20:37:16,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1815680.0, ans=0.2 2024-08-12 20:37:22,121 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=15.0 2024-08-12 20:37:26,301 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.42 vs. limit=15.0 2024-08-12 20:37:38,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1815880.0, ans=0.0 2024-08-12 20:37:44,076 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-12 20:37:51,941 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 20:37:52,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1815980.0, ans=0.0 2024-08-12 20:37:53,446 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 20:37:55,966 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 20:37:59,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1816080.0, ans=0.0 2024-08-12 20:37:59,930 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7700, loss[loss=0.1011, beats_loss=0.008495, ecapa_loss=0.0001887, whisper_loss=0.09071, over 14305.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01087, ecapa_loss=0.0001713, whisper_loss=0.09119, over 3874523.96 frames. ], batch size: 55, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:38:00,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1816080.0, ans=0.2 2024-08-12 20:38:21,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1816180.0, ans=0.05 2024-08-12 20:38:22,626 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.211e-02 2024-08-12 20:38:24,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1816180.0, ans=0.125 2024-08-12 20:38:27,542 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.538e+01 2.763e+01 3.264e+01 5.327e+01, threshold=5.526e+01, percent-clipped=0.0 2024-08-12 20:38:46,002 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-12 20:38:47,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1816380.0, ans=0.125 2024-08-12 20:38:51,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1816380.0, ans=0.1 2024-08-12 20:39:05,158 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 20:39:08,988 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7750, loss[loss=0.07111, beats_loss=0.01364, ecapa_loss=0.0001446, whisper_loss=0.05603, over 14192.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01087, ecapa_loss=0.000171, whisper_loss=0.0907, over 3871660.12 frames. ], batch size: 57, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:39:13,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1816580.0, ans=0.1 2024-08-12 20:39:14,971 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 20:39:26,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1816680.0, ans=0.0 2024-08-12 20:39:30,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1816680.0, ans=0.125 2024-08-12 20:39:33,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1816680.0, ans=0.2 2024-08-12 20:39:45,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1816780.0, ans=0.0 2024-08-12 20:39:58,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1816880.0, ans=0.0 2024-08-12 20:39:58,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1816880.0, ans=0.1 2024-08-12 20:40:07,561 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 20:40:14,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1816980.0, ans=0.0 2024-08-12 20:40:14,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1816980.0, ans=0.0 2024-08-12 20:40:18,080 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7800, loss[loss=0.1205, beats_loss=0.007463, ecapa_loss=0.0002167, whisper_loss=0.1108, over 20021.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01086, ecapa_loss=0.0001714, whisper_loss=0.09134, over 3911785.28 frames. ], batch size: 85, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:40:20,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1817080.0, ans=0.0 2024-08-12 20:40:23,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1817080.0, ans=0.125 2024-08-12 20:40:28,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1817080.0, ans=0.0 2024-08-12 20:40:38,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1817180.0, ans=0.015 2024-08-12 20:40:45,813 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.560e+01 2.836e+01 3.091e+01 4.411e+01, threshold=5.671e+01, percent-clipped=0.0 2024-08-12 20:41:05,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1817380.0, ans=0.05 2024-08-12 20:41:13,458 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 20:41:17,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1817480.0, ans=0.125 2024-08-12 20:41:27,358 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7850, loss[loss=0.1221, beats_loss=0.01049, ecapa_loss=0.0001349, whisper_loss=0.1102, over 21558.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01085, ecapa_loss=0.000172, whisper_loss=0.09201, over 3940171.42 frames. ], batch size: 79, lr: 4.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:41:57,332 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 20:42:07,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1817880.0, ans=0.125 2024-08-12 20:42:10,048 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 20:42:35,453 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 20:42:35,985 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=15.0 2024-08-12 20:42:36,553 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7900, loss[loss=0.1108, beats_loss=0.008654, ecapa_loss=0.0001803, whisper_loss=0.1004, over 23118.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0109, ecapa_loss=0.0001721, whisper_loss=0.09186, over 3924638.68 frames. ], batch size: 93, lr: 4.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:43:01,871 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 20:43:04,174 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.497e+01 2.722e+01 3.152e+01 4.641e+01, threshold=5.444e+01, percent-clipped=0.0 2024-08-12 20:43:04,508 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 20:43:14,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1818280.0, ans=0.125 2024-08-12 20:43:17,565 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.30 vs. limit=10.0 2024-08-12 20:43:25,249 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.356e-03 2024-08-12 20:43:32,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1818480.0, ans=0.2 2024-08-12 20:43:34,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1818480.0, ans=0.2 2024-08-12 20:43:45,588 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 7950, loss[loss=0.1247, beats_loss=0.009323, ecapa_loss=0.0002013, whisper_loss=0.1134, over 21603.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01085, ecapa_loss=0.0001721, whisper_loss=0.09276, over 3902939.76 frames. ], batch size: 88, lr: 4.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:43:48,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1818580.0, ans=0.0 2024-08-12 20:43:51,349 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 12 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 20:43:53,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1818580.0, ans=0.07 2024-08-12 20:44:26,984 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-12 20:44:32,871 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 20:44:55,014 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8000, loss[loss=0.107, beats_loss=0.01047, ecapa_loss=0.0001834, whisper_loss=0.09472, over 21971.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01091, ecapa_loss=0.0001709, whisper_loss=0.09276, over 3925826.86 frames. ], batch size: 86, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:44:56,501 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 20:44:57,862 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 20:45:09,002 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2024-08-12 20:45:09,749 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 29 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-12 20:45:22,512 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.456e+01 2.721e+01 3.092e+01 4.967e+01, threshold=5.442e+01, percent-clipped=0.0 2024-08-12 20:45:26,781 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 20:45:27,048 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:45:28,651 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.30 vs. limit=12.0 2024-08-12 20:45:29,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1819280.0, ans=0.125 2024-08-12 20:45:36,999 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 9 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 20:45:38,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1819380.0, ans=0.0 2024-08-12 20:45:40,906 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 20:45:48,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1819380.0, ans=0.125 2024-08-12 20:45:57,648 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 20:46:04,239 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8050, loss[loss=0.1151, beats_loss=0.0104, ecapa_loss=0.0001723, whisper_loss=0.103, over 21539.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01092, ecapa_loss=0.0001722, whisper_loss=0.09229, over 3902730.06 frames. ], batch size: 88, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:46:40,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1819780.0, ans=0.125 2024-08-12 20:46:42,753 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.16 vs. limit=8.0 2024-08-12 20:47:01,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1819980.0, ans=0.025 2024-08-12 20:47:13,423 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8100, loss[loss=0.1195, beats_loss=0.01103, ecapa_loss=0.0001629, whisper_loss=0.1068, over 18378.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01087, ecapa_loss=0.0001722, whisper_loss=0.09152, over 3881078.71 frames. ], batch size: 73, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:47:20,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1820080.0, ans=0.1 2024-08-12 20:47:35,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1820180.0, ans=0.05 2024-08-12 20:47:36,475 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 20:47:40,268 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.501e+01 2.882e+01 3.230e+01 4.763e+01, threshold=5.764e+01, percent-clipped=0.0 2024-08-12 20:47:49,530 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 20:47:52,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1820280.0, ans=0.0 2024-08-12 20:47:55,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1820380.0, ans=0.0 2024-08-12 20:48:03,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1820380.0, ans=0.125 2024-08-12 20:48:13,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1820480.0, ans=0.125 2024-08-12 20:48:22,276 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8150, loss[loss=0.09995, beats_loss=0.009203, ecapa_loss=0.000204, whisper_loss=0.0887, over 22051.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01087, ecapa_loss=0.0001725, whisper_loss=0.09144, over 3881655.21 frames. ], batch size: 90, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:48:43,502 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 20:48:49,346 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=12.0 2024-08-12 20:49:12,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1820880.0, ans=0.0 2024-08-12 20:49:24,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1820980.0, ans=0.125 2024-08-12 20:49:31,648 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8200, loss[loss=0.1083, beats_loss=0.008293, ecapa_loss=0.0002006, whisper_loss=0.09803, over 22305.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01084, ecapa_loss=0.0001727, whisper_loss=0.09163, over 3886602.14 frames. ], batch size: 89, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:49:42,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1821080.0, ans=0.05 2024-08-12 20:49:50,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1821180.0, ans=0.0 2024-08-12 20:49:55,370 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 33 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 20:49:57,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1821180.0, ans=0.125 2024-08-12 20:49:59,437 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.516e+01 2.770e+01 3.136e+01 5.305e+01, threshold=5.540e+01, percent-clipped=0.0 2024-08-12 20:49:59,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1821280.0, ans=0.125 2024-08-12 20:50:03,773 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 20:50:12,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1821380.0, ans=0.125 2024-08-12 20:50:16,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1821380.0, ans=0.125 2024-08-12 20:50:22,566 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 20:50:32,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1821480.0, ans=0.125 2024-08-12 20:50:38,083 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-12 20:50:40,576 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8250, loss[loss=0.07533, beats_loss=0.01339, ecapa_loss=0.0001599, whisper_loss=0.06033, over 17474.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01085, ecapa_loss=0.0001712, whisper_loss=0.09123, over 3889937.16 frames. ], batch size: 73, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:50:41,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1821580.0, ans=0.1 2024-08-12 20:50:49,072 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-12 20:50:52,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1821580.0, ans=0.125 2024-08-12 20:50:54,768 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 20:50:54,980 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.656e+01 2024-08-12 20:51:09,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1821780.0, ans=0.0 2024-08-12 20:51:10,567 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.004e-02 2024-08-12 20:51:21,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1821880.0, ans=0.2 2024-08-12 20:51:26,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.91 vs. limit=15.0 2024-08-12 20:51:27,720 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2024-08-12 20:51:30,959 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 20:51:33,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1821880.0, ans=15.0 2024-08-12 20:51:40,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1821980.0, ans=0.125 2024-08-12 20:51:43,758 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 29 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-12 20:51:46,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1821980.0, ans=0.0 2024-08-12 20:51:50,238 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8300, loss[loss=0.08676, beats_loss=0.01163, ecapa_loss=0.0001987, whisper_loss=0.07314, over 16669.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01083, ecapa_loss=0.0001709, whisper_loss=0.09155, over 3876544.60 frames. ], batch size: 69, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:51:50,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1822080.0, ans=0.0 2024-08-12 20:51:58,901 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 20:52:17,634 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.463e+01 2.692e+01 3.120e+01 9.968e+01, threshold=5.383e+01, percent-clipped=3.0 2024-08-12 20:52:36,700 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 23 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 20:52:48,633 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 20:52:55,242 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 20:52:57,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1822580.0, ans=0.0 2024-08-12 20:52:58,054 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8350, loss[loss=0.09755, beats_loss=0.009554, ecapa_loss=0.0002171, whisper_loss=0.08582, over 16289.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01083, ecapa_loss=0.000172, whisper_loss=0.09202, over 3885504.66 frames. ], batch size: 69, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:53:25,370 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-12 20:53:33,424 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 20:53:51,858 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.43 vs. limit=10.0 2024-08-12 20:54:07,911 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8400, loss[loss=0.1093, beats_loss=0.01081, ecapa_loss=0.0001565, whisper_loss=0.0969, over 22540.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01091, ecapa_loss=0.0001716, whisper_loss=0.09151, over 3899965.26 frames. ], batch size: 89, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:54:35,957 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.526e+01 2.875e+01 3.220e+01 4.758e+01, threshold=5.750e+01, percent-clipped=0.0 2024-08-12 20:54:59,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1823380.0, ans=0.0 2024-08-12 20:55:02,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1823380.0, ans=0.0 2024-08-12 20:55:08,003 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 20:55:18,859 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8450, loss[loss=0.09321, beats_loss=0.009874, ecapa_loss=0.0002104, whisper_loss=0.08124, over 14809.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01082, ecapa_loss=0.0001723, whisper_loss=0.09243, over 3899977.55 frames. ], batch size: 63, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:55:21,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1823580.0, ans=0.0 2024-08-12 20:55:27,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1823580.0, ans=0.125 2024-08-12 20:55:28,036 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 20:55:28,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1823580.0, ans=0.125 2024-08-12 20:55:30,311 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2024-08-12 20:55:50,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1823780.0, ans=0.0 2024-08-12 20:56:30,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1824080.0, ans=0.0 2024-08-12 20:56:31,138 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8500, loss[loss=0.1106, beats_loss=0.01022, ecapa_loss=0.0002074, whisper_loss=0.09834, over 20982.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01095, ecapa_loss=0.0001717, whisper_loss=0.09111, over 3876689.85 frames. ], batch size: 89, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:56:57,583 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:57:01,695 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.569e+01 2.792e+01 3.196e+01 4.300e+01, threshold=5.585e+01, percent-clipped=0.0 2024-08-12 20:57:01,950 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 20:57:07,011 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.20 vs. limit=12.0 2024-08-12 20:57:11,304 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.42 vs. limit=15.0 2024-08-12 20:57:16,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1824380.0, ans=0.125 2024-08-12 20:57:19,258 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 20:57:29,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1824480.0, ans=0.1 2024-08-12 20:57:46,080 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8550, loss[loss=0.06214, beats_loss=0.01272, ecapa_loss=0.0002012, whisper_loss=0.04741, over 15133.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01092, ecapa_loss=0.0001714, whisper_loss=0.09161, over 3914024.05 frames. ], batch size: 64, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:57:47,781 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 20:57:49,790 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2024-08-12 20:58:23,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1824780.0, ans=0.125 2024-08-12 20:58:46,034 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-12 20:58:52,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1824980.0, ans=0.125 2024-08-12 20:58:58,956 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8600, loss[loss=0.1009, beats_loss=0.01303, ecapa_loss=0.0001361, whisper_loss=0.08652, over 23140.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01082, ecapa_loss=0.0001721, whisper_loss=0.09211, over 3909929.74 frames. ], batch size: 92, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:59:07,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1825080.0, ans=0.125 2024-08-12 20:59:08,091 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-12 20:59:31,418 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.497e+01 2.777e+01 3.095e+01 5.281e+01, threshold=5.554e+01, percent-clipped=0.0 2024-08-12 20:59:48,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1825380.0, ans=0.07 2024-08-12 20:59:53,818 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 14 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 21:00:17,632 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8650, loss[loss=0.1123, beats_loss=0.009103, ecapa_loss=0.000218, whisper_loss=0.101, over 22208.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01081, ecapa_loss=0.000174, whisper_loss=0.09232, over 3922790.71 frames. ], batch size: 91, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:00:19,494 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 21:00:30,145 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 21:00:30,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1825580.0, ans=0.1 2024-08-12 21:00:44,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1825680.0, ans=0.1 2024-08-12 21:01:00,657 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=15.0 2024-08-12 21:01:14,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1825880.0, ans=0.04949747468305833 2024-08-12 21:01:15,226 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 26 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-12 21:01:27,125 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-12 21:01:33,195 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8700, loss[loss=0.1053, beats_loss=0.01061, ecapa_loss=0.0002035, whisper_loss=0.09262, over 21491.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01084, ecapa_loss=0.0001733, whisper_loss=0.09276, over 3904267.14 frames. ], batch size: 90, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:01:35,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1826080.0, ans=0.0 2024-08-12 21:01:37,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1826080.0, ans=0.125 2024-08-12 21:01:42,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1826080.0, ans=0.0 2024-08-12 21:02:00,034 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-12 21:02:03,047 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 21:02:03,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1826280.0, ans=0.2 2024-08-12 21:02:04,120 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.617e+01 2.806e+01 3.109e+01 1.024e+02, threshold=5.612e+01, percent-clipped=1.0 2024-08-12 21:02:22,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1826380.0, ans=0.09899494936611666 2024-08-12 21:02:22,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1826380.0, ans=0.1 2024-08-12 21:02:38,994 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.80 vs. limit=22.5 2024-08-12 21:02:46,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1826480.0, ans=0.09899494936611666 2024-08-12 21:02:50,048 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8750, loss[loss=0.094, beats_loss=0.0121, ecapa_loss=0.0001608, whisper_loss=0.08029, over 23443.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01083, ecapa_loss=0.0001723, whisper_loss=0.09282, over 3900920.27 frames. ], batch size: 95, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:02:52,011 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-12 21:03:40,423 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 21:03:45,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1826880.0, ans=0.125 2024-08-12 21:03:54,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1826980.0, ans=0.0 2024-08-12 21:03:56,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1826980.0, ans=0.125 2024-08-12 21:04:00,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1826980.0, ans=0.0 2024-08-12 21:04:08,027 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8800, loss[loss=0.1076, beats_loss=0.01336, ecapa_loss=0.0001267, whisper_loss=0.09294, over 22987.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01096, ecapa_loss=0.000172, whisper_loss=0.09267, over 3901562.20 frames. ], batch size: 90, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:04:26,935 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 21:04:28,228 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 21:04:39,360 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.528e+01 2.804e+01 3.159e+01 1.036e+02, threshold=5.609e+01, percent-clipped=2.0 2024-08-12 21:04:50,947 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 21 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-12 21:04:54,699 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.985e+00 2024-08-12 21:05:01,946 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 21:05:02,522 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.87 vs. limit=15.0 2024-08-12 21:05:13,861 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.76 vs. limit=15.0 2024-08-12 21:05:18,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1827480.0, ans=0.1 2024-08-12 21:05:24,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1827480.0, ans=0.1 2024-08-12 21:05:26,724 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8850, loss[loss=0.1168, beats_loss=0.008469, ecapa_loss=0.0001744, whisper_loss=0.1066, over 18070.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01094, ecapa_loss=0.0001702, whisper_loss=0.09258, over 3932315.39 frames. ], batch size: 67, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:05:29,899 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.20 vs. limit=22.5 2024-08-12 21:05:33,956 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 21:05:38,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1827580.0, ans=0.125 2024-08-12 21:05:58,965 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 21:06:00,052 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-12 21:06:03,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1827780.0, ans=0.125 2024-08-12 21:06:09,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1827780.0, ans=0.09899494936611666 2024-08-12 21:06:22,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1827880.0, ans=0.125 2024-08-12 21:06:27,371 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 21:06:28,110 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2024-08-12 21:06:33,380 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-12 21:06:40,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1827980.0, ans=0.1 2024-08-12 21:06:40,868 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.31 vs. limit=22.5 2024-08-12 21:06:42,717 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8900, loss[loss=0.1156, beats_loss=0.009915, ecapa_loss=0.0001852, whisper_loss=0.1039, over 22939.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01097, ecapa_loss=0.0001703, whisper_loss=0.09207, over 3888723.60 frames. ], batch size: 93, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:06:55,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1828080.0, ans=0.0 2024-08-12 21:07:10,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1828180.0, ans=0.125 2024-08-12 21:07:10,995 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-12 21:07:15,888 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.512e+01 2.855e+01 3.103e+01 6.109e+01, threshold=5.710e+01, percent-clipped=1.0 2024-08-12 21:07:18,741 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 21:07:22,158 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 8 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 21:07:25,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1828280.0, ans=0.0 2024-08-12 21:07:33,802 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-12 21:07:53,599 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.21 vs. limit=22.5 2024-08-12 21:07:59,889 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 8950, loss[loss=0.09355, beats_loss=0.00987, ecapa_loss=0.0001648, whisper_loss=0.08203, over 19313.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01103, ecapa_loss=0.0001705, whisper_loss=0.09166, over 3900964.00 frames. ], batch size: 76, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:08:00,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1828580.0, ans=0.0 2024-08-12 21:08:02,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-08-12 21:08:09,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1828580.0, ans=0.035 2024-08-12 21:08:22,827 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 21:08:47,091 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-12 21:08:58,577 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 21:09:00,199 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 21:09:16,141 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9000, loss[loss=0.109, beats_loss=0.01314, ecapa_loss=0.0001347, whisper_loss=0.09454, over 23103.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.011, ecapa_loss=0.0001712, whisper_loss=0.09197, over 3912766.34 frames. ], batch size: 93, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:09:16,142 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 21:09:47,382 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.2363, 2.9992, 2.4187, 2.6363], device='cuda:2') 2024-08-12 21:09:54,911 INFO [train_multi_KD3.py:1149] (2/4) Epoch 13, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005776, whisper_loss=0.2483, over 922467.00 frames. 2024-08-12 21:10:13,769 INFO [train_multi_KD3.py:1149] (2/4) Epoch 13, validation on SV_voxceleb1: loss=0.004711, beats_loss=0, ecapa_loss=0.0004711, whisper_loss=0, over 939242.00 frames. 2024-08-12 21:12:02,796 INFO [train_multi_KD3.py:1149] (2/4) Epoch 13, validation on AT_audioset: loss=0.02411, beats_loss=0.02411, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 21:12:02,800 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 21:12:04,352 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 21:12:37,447 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.414e+01 2.685e+01 3.059e+01 6.063e+01, threshold=5.370e+01, percent-clipped=1.0 2024-08-12 21:12:37,597 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 21:12:41,898 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 21:13:02,560 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2024-08-12 21:13:07,793 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 21:13:15,592 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-12 21:13:22,654 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9050, loss[loss=0.09938, beats_loss=0.009391, ecapa_loss=0.0001377, whisper_loss=0.08861, over 14561.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.011, ecapa_loss=0.0001709, whisper_loss=0.0927, over 3921076.01 frames. ], batch size: 53, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:13:24,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1829580.0, ans=0.125 2024-08-12 21:13:26,977 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 21:13:41,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1829680.0, ans=0.125 2024-08-12 21:13:44,970 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.19 vs. limit=22.5 2024-08-12 21:14:01,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1829780.0, ans=0.125 2024-08-12 21:14:07,607 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.125e-02 2024-08-12 21:14:38,543 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9100, loss[loss=0.1343, beats_loss=0.009142, ecapa_loss=0.0001627, whisper_loss=0.1236, over 23499.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01095, ecapa_loss=0.0001711, whisper_loss=0.09315, over 3939838.12 frames. ], batch size: 88, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:14:58,789 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-12 21:15:02,451 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.34 vs. limit=10.0 2024-08-12 21:15:11,888 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.477e+01 2.788e+01 3.055e+01 6.197e+01, threshold=5.576e+01, percent-clipped=1.0 2024-08-12 21:15:21,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1830280.0, ans=0.125 2024-08-12 21:15:27,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.61 vs. limit=22.5 2024-08-12 21:15:56,210 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9150, loss[loss=0.106, beats_loss=0.01249, ecapa_loss=0.0001556, whisper_loss=0.09199, over 22285.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01099, ecapa_loss=0.0001716, whisper_loss=0.09281, over 3924703.86 frames. ], batch size: 91, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:15:56,401 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 21:16:11,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1830680.0, ans=0.125 2024-08-12 21:16:27,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1830780.0, ans=0.125 2024-08-12 21:16:34,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1830780.0, ans=0.125 2024-08-12 21:16:38,482 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 35 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-12 21:16:47,501 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 21:16:48,162 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.819e-02 2024-08-12 21:16:50,921 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 21:16:59,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1830980.0, ans=0.95 2024-08-12 21:17:03,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1830980.0, ans=0.125 2024-08-12 21:17:10,381 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9200, loss[loss=0.1149, beats_loss=0.01009, ecapa_loss=0.0001185, whisper_loss=0.1037, over 16586.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01095, ecapa_loss=0.0001713, whisper_loss=0.09351, over 3938186.32 frames. ], batch size: 62, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:17:18,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1831080.0, ans=0.0 2024-08-12 21:17:19,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1831080.0, ans=0.0 2024-08-12 21:17:23,792 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=15.0 2024-08-12 21:17:29,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1831180.0, ans=0.1 2024-08-12 21:17:36,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1831180.0, ans=0.1 2024-08-12 21:17:42,020 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.481e+01 2.738e+01 3.160e+01 4.519e+01, threshold=5.476e+01, percent-clipped=0.0 2024-08-12 21:17:45,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1831280.0, ans=0.1 2024-08-12 21:17:47,008 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-12 21:17:51,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1831280.0, ans=0.1 2024-08-12 21:17:55,442 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 21:18:00,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1831380.0, ans=0.2 2024-08-12 21:18:10,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1831480.0, ans=0.125 2024-08-12 21:18:18,852 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.27 vs. limit=10.0 2024-08-12 21:18:22,407 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 21:18:26,385 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9250, loss[loss=0.1199, beats_loss=0.01066, ecapa_loss=0.0002191, whisper_loss=0.1071, over 21135.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01097, ecapa_loss=0.0001721, whisper_loss=0.09274, over 3927494.54 frames. ], batch size: 88, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:18:34,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1831580.0, ans=0.125 2024-08-12 21:18:37,316 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-12 21:18:47,913 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 21:18:55,513 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 16 from LS+wenet, 34 from Vox, 41 fro AS 2024-08-12 21:19:41,174 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2024-08-12 21:19:41,598 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9300, loss[loss=0.1157, beats_loss=0.009731, ecapa_loss=0.0001669, whisper_loss=0.1043, over 22965.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01089, ecapa_loss=0.0001728, whisper_loss=0.09274, over 3954105.23 frames. ], batch size: 91, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:20:06,643 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-12 21:20:11,867 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.613e+01 2.997e+01 3.337e+01 4.853e+01, threshold=5.993e+01, percent-clipped=0.0 2024-08-12 21:20:37,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1832380.0, ans=0.125 2024-08-12 21:20:41,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1832480.0, ans=0.2 2024-08-12 21:20:50,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1832480.0, ans=0.125 2024-08-12 21:20:51,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1832480.0, ans=0.2 2024-08-12 21:20:53,440 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-12 21:20:54,543 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9350, loss[loss=0.09578, beats_loss=0.0128, ecapa_loss=0.0001412, whisper_loss=0.08156, over 23102.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01085, ecapa_loss=0.0001712, whisper_loss=0.09301, over 3937821.25 frames. ], batch size: 91, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:21:03,744 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 21:21:04,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1832580.0, ans=0.125 2024-08-12 21:21:26,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1832780.0, ans=0.2 2024-08-12 21:21:38,713 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 21:21:40,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1832880.0, ans=0.125 2024-08-12 21:21:43,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1832880.0, ans=0.2 2024-08-12 21:21:45,317 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-12 21:21:46,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1832880.0, ans=0.125 2024-08-12 21:21:52,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1832980.0, ans=0.2 2024-08-12 21:21:56,279 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-12 21:21:58,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1832980.0, ans=0.0 2024-08-12 21:22:01,215 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-12 21:22:08,366 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9400, loss[loss=0.1079, beats_loss=0.009722, ecapa_loss=0.0001316, whisper_loss=0.09688, over 21017.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01094, ecapa_loss=0.0001715, whisper_loss=0.09242, over 3942888.67 frames. ], batch size: 76, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:22:16,169 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-12 21:22:17,611 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 21:22:17,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1833080.0, ans=0.125 2024-08-12 21:22:21,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1833080.0, ans=0.125 2024-08-12 21:22:26,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1833180.0, ans=0.0 2024-08-12 21:22:40,573 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.361e+01 2.679e+01 2.977e+01 4.432e+01, threshold=5.358e+01, percent-clipped=0.0 2024-08-12 21:23:11,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1833480.0, ans=0.125 2024-08-12 21:23:15,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1833480.0, ans=0.035 2024-08-12 21:23:24,581 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9450, loss[loss=0.1042, beats_loss=0.0108, ecapa_loss=0.0001619, whisper_loss=0.09182, over 15389.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01099, ecapa_loss=0.00017, whisper_loss=0.09232, over 3921723.24 frames. ], batch size: 58, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:23:24,784 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 21:23:43,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1833680.0, ans=0.125 2024-08-12 21:24:12,108 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 21:24:14,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1833880.0, ans=0.125 2024-08-12 21:24:31,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1833980.0, ans=0.125 2024-08-12 21:24:34,364 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.511e+01 2024-08-12 21:24:39,142 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9500, loss[loss=0.0991, beats_loss=0.00969, ecapa_loss=0.0001792, whisper_loss=0.08762, over 22779.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01101, ecapa_loss=0.0001715, whisper_loss=0.09153, over 3937486.66 frames. ], batch size: 91, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:24:57,564 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 21:25:01,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1834180.0, ans=0.0 2024-08-12 21:25:04,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1834180.0, ans=0.0 2024-08-12 21:25:09,429 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.426e+01 2.699e+01 3.219e+01 5.763e+01, threshold=5.398e+01, percent-clipped=1.0 2024-08-12 21:25:09,624 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 31 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-12 21:25:11,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1834280.0, ans=0.2 2024-08-12 21:25:13,728 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-12 21:25:14,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1834280.0, ans=0.0 2024-08-12 21:25:20,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1834280.0, ans=0.0 2024-08-12 21:25:29,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1834380.0, ans=0.07 2024-08-12 21:25:35,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1834480.0, ans=0.125 2024-08-12 21:25:38,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1834480.0, ans=0.05 2024-08-12 21:25:42,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1834480.0, ans=0.0 2024-08-12 21:25:49,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1834580.0, ans=0.1 2024-08-12 21:25:50,205 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9550, loss[loss=0.1029, beats_loss=0.0111, ecapa_loss=0.0001636, whisper_loss=0.09014, over 22086.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01092, ecapa_loss=0.0001735, whisper_loss=0.09176, over 3928107.61 frames. ], batch size: 92, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:26:09,095 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-08-12 21:26:21,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1834780.0, ans=0.1 2024-08-12 21:27:01,610 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9600, loss[loss=0.06637, beats_loss=0.01129, ecapa_loss=0.0001553, whisper_loss=0.05353, over 14960.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01092, ecapa_loss=0.0001734, whisper_loss=0.09154, over 3933243.49 frames. ], batch size: 59, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:27:07,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1835080.0, ans=0.0 2024-08-12 21:27:10,220 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 21:27:14,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1835180.0, ans=0.125 2024-08-12 21:27:24,955 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 21:27:27,922 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.584e+01 2024-08-12 21:27:30,210 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.578e+01 2.916e+01 3.452e+01 6.223e+01, threshold=5.833e+01, percent-clipped=1.0 2024-08-12 21:27:47,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2024-08-12 21:27:51,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1835380.0, ans=0.125 2024-08-12 21:27:54,346 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 21:27:58,395 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 21:28:05,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1835480.0, ans=0.0 2024-08-12 21:28:06,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1835480.0, ans=0.1 2024-08-12 21:28:10,165 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9650, loss[loss=0.1041, beats_loss=0.01052, ecapa_loss=0.0001295, whisper_loss=0.09226, over 23601.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0109, ecapa_loss=0.0001729, whisper_loss=0.0914, over 3900961.85 frames. ], batch size: 91, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:28:51,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1835880.0, ans=0.1 2024-08-12 21:29:00,380 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 21:29:06,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1835980.0, ans=0.125 2024-08-12 21:29:07,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1835980.0, ans=0.125 2024-08-12 21:29:18,568 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 21:29:19,771 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9700, loss[loss=0.09592, beats_loss=0.00986, ecapa_loss=0.0001505, whisper_loss=0.08455, over 16091.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01087, ecapa_loss=0.0001742, whisper_loss=0.09188, over 3914987.41 frames. ], batch size: 62, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:29:30,750 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 21:29:34,848 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 12 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 21:29:36,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1836180.0, ans=0.125 2024-08-12 21:29:39,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1836180.0, ans=0.125 2024-08-12 21:29:48,838 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.429e+01 2.686e+01 3.028e+01 5.758e+01, threshold=5.372e+01, percent-clipped=0.0 2024-08-12 21:30:03,010 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-12 21:30:09,249 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=15.0 2024-08-12 21:30:10,549 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-08-12 21:30:24,705 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.021e-01 2024-08-12 21:30:30,595 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9750, loss[loss=0.1172, beats_loss=0.01106, ecapa_loss=0.0001656, whisper_loss=0.1045, over 15411.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01094, ecapa_loss=0.0001724, whisper_loss=0.09121, over 3910212.99 frames. ], batch size: 58, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:30:30,837 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 21:30:35,918 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.31 vs. limit=6.0 2024-08-12 21:30:37,136 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.92 vs. limit=22.5 2024-08-12 21:30:50,216 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 21:31:07,877 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.60 vs. limit=15.0 2024-08-12 21:31:15,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1836880.0, ans=0.2 2024-08-12 21:31:30,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1836980.0, ans=0.0 2024-08-12 21:31:32,367 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 9 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 21:31:42,703 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9800, loss[loss=0.1233, beats_loss=0.009731, ecapa_loss=0.0001597, whisper_loss=0.1119, over 22987.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01097, ecapa_loss=0.0001719, whisper_loss=0.09125, over 3915717.04 frames. ], batch size: 90, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:31:44,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1837080.0, ans=0.125 2024-08-12 21:32:06,421 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.94 vs. limit=22.5 2024-08-12 21:32:12,375 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.082e+01 2.453e+01 2.781e+01 3.151e+01 8.550e+01, threshold=5.562e+01, percent-clipped=1.0 2024-08-12 21:32:13,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1837280.0, ans=0.125 2024-08-12 21:32:26,495 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 21:32:34,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1837380.0, ans=0.125 2024-08-12 21:32:48,596 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 21:32:55,381 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9850, loss[loss=0.1197, beats_loss=0.009657, ecapa_loss=0.0001303, whisper_loss=0.1087, over 14762.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01094, ecapa_loss=0.0001731, whisper_loss=0.09171, over 3904235.85 frames. ], batch size: 54, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:33:28,472 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 32 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-12 21:33:39,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1837880.0, ans=0.125 2024-08-12 21:34:01,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1837980.0, ans=0.0 2024-08-12 21:34:06,974 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9900, loss[loss=0.117, beats_loss=0.01274, ecapa_loss=0.0001614, whisper_loss=0.1026, over 19603.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01091, ecapa_loss=0.000171, whisper_loss=0.09267, over 3907077.95 frames. ], batch size: 78, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:34:32,924 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 21:34:34,929 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.31 vs. limit=22.5 2024-08-12 21:34:36,938 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.567e+01 2.799e+01 3.140e+01 5.231e+01, threshold=5.598e+01, percent-clipped=0.0 2024-08-12 21:34:39,969 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 21:34:51,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1838380.0, ans=0.0 2024-08-12 21:34:52,783 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 21:34:54,810 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.20 vs. limit=12.0 2024-08-12 21:35:00,154 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-12 21:35:02,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1838380.0, ans=0.0 2024-08-12 21:35:19,999 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2024-08-12 21:35:21,877 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 9950, loss[loss=0.1007, beats_loss=0.009284, ecapa_loss=0.0002102, whisper_loss=0.08928, over 21392.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01089, ecapa_loss=0.0001713, whisper_loss=0.09227, over 3917522.39 frames. ], batch size: 91, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:35:27,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2024-08-12 21:35:33,462 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.03 vs. limit=12.0 2024-08-12 21:35:52,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1838780.0, ans=0.0 2024-08-12 21:36:03,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1838780.0, ans=0.125 2024-08-12 21:36:04,694 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 21:36:11,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1838880.0, ans=0.125 2024-08-12 21:36:14,554 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 21:36:19,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1838880.0, ans=0.1 2024-08-12 21:36:28,765 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 31 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 21:36:28,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1838980.0, ans=0.0 2024-08-12 21:36:36,577 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10000, loss[loss=0.09405, beats_loss=0.01108, ecapa_loss=0.0001897, whisper_loss=0.08108, over 15208.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01093, ecapa_loss=0.0001717, whisper_loss=0.09235, over 3901198.40 frames. ], batch size: 61, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:36:36,808 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 21:36:42,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1839080.0, ans=0.0 2024-08-12 21:37:01,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1839180.0, ans=0.125 2024-08-12 21:37:01,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1839180.0, ans=0.07 2024-08-12 21:37:06,047 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.540e+01 2.812e+01 3.144e+01 2.734e+02, threshold=5.624e+01, percent-clipped=2.0 2024-08-12 21:37:18,412 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.31 vs. limit=15.0 2024-08-12 21:37:25,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1839380.0, ans=0.125 2024-08-12 21:37:33,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1839480.0, ans=0.1 2024-08-12 21:37:43,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1839480.0, ans=0.125 2024-08-12 21:37:43,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1839480.0, ans=0.125 2024-08-12 21:37:46,033 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-12 21:37:48,446 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10050, loss[loss=0.1124, beats_loss=0.0102, ecapa_loss=0.0001402, whisper_loss=0.1008, over 19803.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01095, ecapa_loss=0.0001712, whisper_loss=0.09234, over 3910339.41 frames. ], batch size: 75, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:38:12,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2024-08-12 21:38:16,239 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.46 vs. limit=15.0 2024-08-12 21:38:31,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1839780.0, ans=0.0 2024-08-12 21:38:48,411 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 39 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 21:38:50,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1839880.0, ans=0.125 2024-08-12 21:39:00,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1839980.0, ans=0.0 2024-08-12 21:39:02,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1839980.0, ans=0.125 2024-08-12 21:39:08,860 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.09 vs. limit=15.0 2024-08-12 21:39:12,401 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10100, loss[loss=0.1018, beats_loss=0.00993, ecapa_loss=0.0001901, whisper_loss=0.09002, over 21829.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01098, ecapa_loss=0.000171, whisper_loss=0.09256, over 3930709.05 frames. ], batch size: 89, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:39:18,665 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.51 vs. limit=15.0 2024-08-12 21:39:45,292 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+01 2.534e+01 2.755e+01 3.172e+01 9.610e+01, threshold=5.510e+01, percent-clipped=1.0 2024-08-12 21:40:02,758 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 21:40:22,594 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 21:40:32,556 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2024-08-12 21:40:34,876 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10150, loss[loss=0.09954, beats_loss=0.01275, ecapa_loss=0.0001238, whisper_loss=0.08555, over 18585.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01102, ecapa_loss=0.000172, whisper_loss=0.09224, over 3967545.64 frames. ], batch size: 73, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:40:39,243 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.27 vs. limit=10.0 2024-08-12 21:40:40,779 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 21:40:41,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1840580.0, ans=0.0 2024-08-12 21:40:41,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1840580.0, ans=0.0 2024-08-12 21:40:48,162 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-12 21:40:52,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1840680.0, ans=0.125 2024-08-12 21:40:53,385 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.42 vs. limit=6.0 2024-08-12 21:41:21,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1840780.0, ans=0.0 2024-08-12 21:42:08,510 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10200, loss[loss=0.1013, beats_loss=0.01161, ecapa_loss=0.0001736, whisper_loss=0.08797, over 22362.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01105, ecapa_loss=0.0001718, whisper_loss=0.09225, over 3935148.68 frames. ], batch size: 93, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:42:23,555 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 25 from LS+wenet, 21 from Vox, 50 fro AS 2024-08-12 21:42:36,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1841180.0, ans=0.125 2024-08-12 21:42:41,973 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 21:42:49,087 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.23 vs. limit=6.0 2024-08-12 21:42:53,950 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+01 2.450e+01 2.670e+01 3.042e+01 4.548e+01, threshold=5.340e+01, percent-clipped=0.0 2024-08-12 21:43:33,636 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 21:43:40,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1841480.0, ans=0.2 2024-08-12 21:43:42,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1841480.0, ans=0.0 2024-08-12 21:43:53,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1841480.0, ans=0.0 2024-08-12 21:43:57,409 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10250, loss[loss=0.08251, beats_loss=0.01216, ecapa_loss=0.0001607, whisper_loss=0.06874, over 15840.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01098, ecapa_loss=0.0001724, whisper_loss=0.09265, over 3915360.50 frames. ], batch size: 63, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:44:27,836 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-12 21:44:42,073 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-12 21:44:46,267 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-12 21:44:47,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1841780.0, ans=0.125 2024-08-12 21:44:51,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1841780.0, ans=0.1 2024-08-12 21:45:01,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=1841880.0, ans=15.0 2024-08-12 21:45:06,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1841880.0, ans=0.125 2024-08-12 21:45:35,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1841980.0, ans=0.2 2024-08-12 21:45:47,182 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10300, loss[loss=0.1036, beats_loss=0.01275, ecapa_loss=0.000192, whisper_loss=0.0889, over 16565.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01098, ecapa_loss=0.0001724, whisper_loss=0.09244, over 3934150.91 frames. ], batch size: 71, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:45:47,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1842080.0, ans=0.0 2024-08-12 21:45:52,272 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 20 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-12 21:46:26,331 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 21:46:37,723 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.506e+01 2.751e+01 3.160e+01 4.441e+01, threshold=5.501e+01, percent-clipped=0.0 2024-08-12 21:47:01,450 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 21:47:04,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1842380.0, ans=0.2 2024-08-12 21:47:13,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1842380.0, ans=0.125 2024-08-12 21:47:33,632 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10350, loss[loss=0.09924, beats_loss=0.01334, ecapa_loss=0.0001387, whisper_loss=0.08452, over 22623.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01104, ecapa_loss=0.0001705, whisper_loss=0.09206, over 3936047.70 frames. ], batch size: 94, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:47:35,030 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 21:47:49,528 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-12 21:48:20,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1842880.0, ans=0.125 2024-08-12 21:48:22,792 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 21:48:32,739 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.73 vs. limit=15.0 2024-08-12 21:48:38,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1842980.0, ans=0.125 2024-08-12 21:48:45,897 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10400, loss[loss=0.1066, beats_loss=0.01104, ecapa_loss=0.0001831, whisper_loss=0.09376, over 16729.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01099, ecapa_loss=0.0001705, whisper_loss=0.09215, over 3909065.32 frames. ], batch size: 66, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:49:04,095 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 21:49:07,329 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 21:49:16,968 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.438e+01 2.753e+01 3.076e+01 5.598e+01, threshold=5.505e+01, percent-clipped=1.0 2024-08-12 21:49:17,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1843280.0, ans=0.125 2024-08-12 21:49:18,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1843280.0, ans=0.2 2024-08-12 21:49:45,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1843480.0, ans=0.2 2024-08-12 21:49:59,583 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10450, loss[loss=0.06214, beats_loss=0.01282, ecapa_loss=0.0001606, whisper_loss=0.04772, over 15242.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01096, ecapa_loss=0.000171, whisper_loss=0.09243, over 3898037.25 frames. ], batch size: 63, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:50:07,135 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 21:50:12,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1843580.0, ans=0.0 2024-08-12 21:50:14,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1843680.0, ans=0.025 2024-08-12 21:50:29,659 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 21:50:34,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1843780.0, ans=0.05 2024-08-12 21:50:37,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1843780.0, ans=0.0 2024-08-12 21:51:01,727 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-12 21:51:06,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1843980.0, ans=0.2 2024-08-12 21:51:14,315 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10500, loss[loss=0.1082, beats_loss=0.01211, ecapa_loss=0.000164, whisper_loss=0.09449, over 22953.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01098, ecapa_loss=0.0001711, whisper_loss=0.09193, over 3881938.01 frames. ], batch size: 94, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:51:18,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1844080.0, ans=0.0 2024-08-12 21:51:29,067 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.24 vs. limit=22.5 2024-08-12 21:51:34,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1844180.0, ans=0.125 2024-08-12 21:51:35,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1844180.0, ans=0.0 2024-08-12 21:51:38,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1844180.0, ans=0.2 2024-08-12 21:51:39,830 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 21:51:45,368 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.358e+01 2.688e+01 3.093e+01 1.105e+02, threshold=5.376e+01, percent-clipped=1.0 2024-08-12 21:51:54,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1844280.0, ans=0.125 2024-08-12 21:52:02,310 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 21:52:02,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1844380.0, ans=0.0 2024-08-12 21:52:04,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1844380.0, ans=0.025 2024-08-12 21:52:13,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1844380.0, ans=0.125 2024-08-12 21:52:20,578 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 26 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 21:52:24,615 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 21:52:30,773 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10550, loss[loss=0.107, beats_loss=0.009539, ecapa_loss=0.0001672, whisper_loss=0.09574, over 16787.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01095, ecapa_loss=0.0001722, whisper_loss=0.0922, over 3870382.64 frames. ], batch size: 64, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:52:47,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1844680.0, ans=0.125 2024-08-12 21:52:55,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1844680.0, ans=0.125 2024-08-12 21:52:56,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1844680.0, ans=0.125 2024-08-12 21:53:19,068 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 21:53:27,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1844880.0, ans=0.07 2024-08-12 21:53:29,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1844880.0, ans=0.0 2024-08-12 21:53:35,484 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 21:53:48,912 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10600, loss[loss=0.05206, beats_loss=0.01509, ecapa_loss=0.0001409, whisper_loss=0.03555, over 12967.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01088, ecapa_loss=0.0001732, whisper_loss=0.09222, over 3895881.51 frames. ], batch size: 53, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:53:52,357 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 37 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-12 21:53:59,327 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2024-08-12 21:54:11,102 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-12 21:54:14,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1845180.0, ans=0.0 2024-08-12 21:54:21,078 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.510e+01 2.765e+01 3.245e+01 5.665e+01, threshold=5.530e+01, percent-clipped=1.0 2024-08-12 21:54:27,410 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 21:54:34,226 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 21:54:35,874 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 37 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 21:54:49,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1845480.0, ans=0.0 2024-08-12 21:54:52,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1845480.0, ans=0.125 2024-08-12 21:54:53,094 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-12 21:55:02,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1845580.0, ans=0.1 2024-08-12 21:55:04,105 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10650, loss[loss=0.1188, beats_loss=0.01149, ecapa_loss=0.0001499, whisper_loss=0.1058, over 18718.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01086, ecapa_loss=0.0001717, whisper_loss=0.0924, over 3902035.03 frames. ], batch size: 75, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:55:05,669 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-12 21:55:05,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1845580.0, ans=0.125 2024-08-12 21:55:13,214 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 21:55:26,316 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 21:55:29,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1845680.0, ans=0.0 2024-08-12 21:55:40,337 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-12 21:55:43,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1845780.0, ans=0.1 2024-08-12 21:55:49,352 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 21:55:53,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1845880.0, ans=0.125 2024-08-12 21:55:59,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1845880.0, ans=0.125 2024-08-12 21:56:13,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1845980.0, ans=0.2 2024-08-12 21:56:23,585 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10700, loss[loss=0.1113, beats_loss=0.01126, ecapa_loss=0.0001414, whisper_loss=0.09863, over 24375.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01081, ecapa_loss=0.0001712, whisper_loss=0.09299, over 3893509.46 frames. ], batch size: 93, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:56:29,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1846080.0, ans=0.0 2024-08-12 21:56:33,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2024-08-12 21:56:55,467 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.619e+01 2.989e+01 3.264e+01 5.454e+01, threshold=5.979e+01, percent-clipped=0.0 2024-08-12 21:56:55,762 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 21:56:58,949 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 21:57:09,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1846380.0, ans=0.2 2024-08-12 21:57:11,086 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 21:57:32,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1846480.0, ans=0.09899494936611666 2024-08-12 21:57:40,241 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10750, loss[loss=0.1003, beats_loss=0.01117, ecapa_loss=0.0001594, whisper_loss=0.0875, over 21551.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01079, ecapa_loss=0.0001711, whisper_loss=0.09333, over 3881329.91 frames. ], batch size: 87, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:57:40,455 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 21:57:47,824 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2024-08-12 21:57:51,022 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.60 vs. limit=10.0 2024-08-12 21:57:59,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1846680.0, ans=0.125 2024-08-12 21:58:09,915 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-12 21:58:35,059 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 21:58:53,653 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10800, loss[loss=0.1147, beats_loss=0.008644, ecapa_loss=0.0001727, whisper_loss=0.1043, over 15818.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01087, ecapa_loss=0.0001706, whisper_loss=0.09318, over 3862494.41 frames. ], batch size: 59, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:59:03,523 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 21:59:18,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1847180.0, ans=0.1 2024-08-12 21:59:23,023 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.579e-02 2024-08-12 21:59:23,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.464e+01 2.831e+01 3.292e+01 5.711e+01, threshold=5.661e+01, percent-clipped=0.0 2024-08-12 21:59:27,959 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 21:59:30,190 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2024-08-12 21:59:32,683 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 21:59:44,100 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 21:59:44,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1847380.0, ans=0.125 2024-08-12 22:00:05,179 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10850, loss[loss=0.105, beats_loss=0.01285, ecapa_loss=0.0001368, whisper_loss=0.09076, over 22379.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01092, ecapa_loss=0.0001699, whisper_loss=0.09363, over 3913378.90 frames. ], batch size: 89, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:00:18,691 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.67 vs. limit=22.5 2024-08-12 22:00:25,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1847680.0, ans=0.0 2024-08-12 22:00:32,808 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 22:00:51,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1847880.0, ans=0.035 2024-08-12 22:00:54,189 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 34 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 22:00:57,698 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.83 vs. limit=15.0 2024-08-12 22:01:01,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1847980.0, ans=0.125 2024-08-12 22:01:07,448 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 19 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-12 22:01:17,069 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10900, loss[loss=0.1046, beats_loss=0.009429, ecapa_loss=0.0001589, whisper_loss=0.09361, over 21006.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01092, ecapa_loss=0.0001706, whisper_loss=0.09271, over 3892599.90 frames. ], batch size: 77, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:01:22,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1848080.0, ans=10.0 2024-08-12 22:01:22,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1848080.0, ans=0.125 2024-08-12 22:01:23,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1848080.0, ans=0.0 2024-08-12 22:01:28,629 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2024-08-12 22:01:45,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1848280.0, ans=15.0 2024-08-12 22:01:48,365 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.539e+01 2.752e+01 3.152e+01 5.586e+01, threshold=5.505e+01, percent-clipped=0.0 2024-08-12 22:01:51,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1848280.0, ans=0.1 2024-08-12 22:01:59,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1848280.0, ans=0.0 2024-08-12 22:02:02,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1848380.0, ans=0.125 2024-08-12 22:02:10,461 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 22:02:17,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1848480.0, ans=0.125 2024-08-12 22:02:19,693 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 22:02:32,050 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 10950, loss[loss=0.1041, beats_loss=0.00999, ecapa_loss=0.0001473, whisper_loss=0.09261, over 23534.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0109, ecapa_loss=0.0001699, whisper_loss=0.09251, over 3915914.66 frames. ], batch size: 90, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:02:46,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1848680.0, ans=0.0 2024-08-12 22:02:48,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1848680.0, ans=0.0 2024-08-12 22:03:03,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1848780.0, ans=0.1 2024-08-12 22:03:20,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1848880.0, ans=0.0 2024-08-12 22:03:23,171 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.846e-01 2024-08-12 22:03:34,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1848980.0, ans=0.125 2024-08-12 22:03:47,214 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11000, loss[loss=0.08829, beats_loss=0.01191, ecapa_loss=0.0002184, whisper_loss=0.07419, over 18090.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01089, ecapa_loss=0.0001699, whisper_loss=0.09206, over 3911665.76 frames. ], batch size: 79, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:03:51,908 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=12.0 2024-08-12 22:03:52,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1849080.0, ans=0.2 2024-08-12 22:03:53,097 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-08-12 22:03:59,812 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 22:04:00,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1849180.0, ans=0.125 2024-08-12 22:04:16,180 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 22:04:17,766 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-12 22:04:18,786 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.465e+01 2.797e+01 3.199e+01 6.867e+01, threshold=5.594e+01, percent-clipped=1.0 2024-08-12 22:04:32,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1849380.0, ans=0.125 2024-08-12 22:04:57,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1849580.0, ans=0.125 2024-08-12 22:04:58,889 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11050, loss[loss=0.08358, beats_loss=0.01105, ecapa_loss=0.0001649, whisper_loss=0.07088, over 21516.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01088, ecapa_loss=0.0001708, whisper_loss=0.09167, over 3926354.82 frames. ], batch size: 90, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:05:06,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1849580.0, ans=0.125 2024-08-12 22:05:28,016 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-12 22:05:37,673 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 22:05:55,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1849880.0, ans=0.0 2024-08-12 22:06:00,329 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.20 vs. limit=15.0 2024-08-12 22:06:06,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1849980.0, ans=0.2 2024-08-12 22:06:11,795 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11100, loss[loss=0.1279, beats_loss=0.009076, ecapa_loss=0.0001926, whisper_loss=0.1169, over 22385.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01085, ecapa_loss=0.0001709, whisper_loss=0.09211, over 3901375.04 frames. ], batch size: 90, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:06:12,934 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.76 vs. limit=15.0 2024-08-12 22:06:16,421 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 22:06:30,226 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 22:06:34,586 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 22:06:44,664 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.700e+01 2.458e+01 2.677e+01 3.068e+01 5.581e+01, threshold=5.354e+01, percent-clipped=0.0 2024-08-12 22:07:06,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1850380.0, ans=0.125 2024-08-12 22:07:06,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1850380.0, ans=0.1 2024-08-12 22:07:12,751 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.77 vs. limit=22.5 2024-08-12 22:07:17,761 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-12 22:07:26,810 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11150, loss[loss=0.1181, beats_loss=0.008723, ecapa_loss=0.0001906, whisper_loss=0.1074, over 18865.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01088, ecapa_loss=0.0001702, whisper_loss=0.09196, over 3896655.48 frames. ], batch size: 76, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:07:40,721 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 17 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 22:07:43,436 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 22:07:45,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1850680.0, ans=0.125 2024-08-12 22:07:50,087 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2024-08-12 22:08:01,366 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.874e+01 2024-08-12 22:08:31,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1850980.0, ans=0.2 2024-08-12 22:08:33,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1850980.0, ans=0.0 2024-08-12 22:08:38,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1850980.0, ans=0.1 2024-08-12 22:08:41,201 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.48 vs. limit=15.0 2024-08-12 22:08:41,523 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11200, loss[loss=0.1287, beats_loss=0.008683, ecapa_loss=0.0001807, whisper_loss=0.1183, over 20702.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01091, ecapa_loss=0.0001711, whisper_loss=0.09174, over 3915844.81 frames. ], batch size: 81, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:08:59,563 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 17 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 22:09:10,048 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 22:09:10,585 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.09 vs. limit=12.0 2024-08-12 22:09:14,200 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.512e+01 2.839e+01 3.173e+01 1.150e+02, threshold=5.678e+01, percent-clipped=1.0 2024-08-12 22:09:16,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1851280.0, ans=0.125 2024-08-12 22:09:36,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1851380.0, ans=0.2 2024-08-12 22:09:43,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1851480.0, ans=0.125 2024-08-12 22:09:43,529 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.90 vs. limit=6.0 2024-08-12 22:10:00,854 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11250, loss[loss=0.1167, beats_loss=0.01083, ecapa_loss=0.0001415, whisper_loss=0.1045, over 18016.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01089, ecapa_loss=0.0001716, whisper_loss=0.09199, over 3888226.67 frames. ], batch size: 69, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:10:09,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=1851580.0, ans=15.0 2024-08-12 22:10:14,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1851580.0, ans=0.125 2024-08-12 22:10:35,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1851780.0, ans=0.125 2024-08-12 22:10:39,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1851780.0, ans=0.2 2024-08-12 22:10:46,819 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 22:10:48,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1851880.0, ans=0.125 2024-08-12 22:11:18,057 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11300, loss[loss=0.09622, beats_loss=0.01087, ecapa_loss=0.00018, whisper_loss=0.08354, over 21262.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01094, ecapa_loss=0.00017, whisper_loss=0.09171, over 3928862.15 frames. ], batch size: 87, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:11:28,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1852080.0, ans=0.0 2024-08-12 22:11:32,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1852080.0, ans=0.2 2024-08-12 22:11:44,207 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 22:11:44,561 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=8.315e+01 2024-08-12 22:11:50,982 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 26 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-12 22:11:55,613 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.542e+01 2.832e+01 3.166e+01 7.074e+01, threshold=5.665e+01, percent-clipped=1.0 2024-08-12 22:12:18,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1852380.0, ans=0.125 2024-08-12 22:12:22,313 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.17 vs. limit=22.5 2024-08-12 22:12:28,481 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-12 22:12:36,302 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 32 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 22:12:40,707 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11350, loss[loss=0.1109, beats_loss=0.01165, ecapa_loss=0.000167, whisper_loss=0.09754, over 23134.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01086, ecapa_loss=0.0001703, whisper_loss=0.09233, over 3938721.78 frames. ], batch size: 93, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:12:43,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1852580.0, ans=0.04949747468305833 2024-08-12 22:12:54,372 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-12 22:12:56,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1852680.0, ans=0.2 2024-08-12 22:13:22,238 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.61 vs. limit=15.0 2024-08-12 22:13:41,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1852880.0, ans=0.125 2024-08-12 22:13:52,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1852980.0, ans=0.0 2024-08-12 22:13:55,639 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.56 vs. limit=8.0 2024-08-12 22:13:57,823 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.75 vs. limit=15.0 2024-08-12 22:14:02,000 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11400, loss[loss=0.1214, beats_loss=0.008753, ecapa_loss=0.0001544, whisper_loss=0.1111, over 17329.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01083, ecapa_loss=0.0001706, whisper_loss=0.09285, over 3947745.63 frames. ], batch size: 65, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:14:12,328 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 22:14:21,986 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2024-08-12 22:14:23,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1853180.0, ans=0.2 2024-08-12 22:14:32,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1853280.0, ans=0.125 2024-08-12 22:14:36,075 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.651e+01 3.000e+01 3.420e+01 5.421e+01, threshold=6.000e+01, percent-clipped=0.0 2024-08-12 22:14:45,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1853280.0, ans=0.125 2024-08-12 22:14:59,945 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 22:15:05,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1853480.0, ans=0.125 2024-08-12 22:15:08,861 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.76 vs. limit=5.0 2024-08-12 22:15:19,728 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11450, loss[loss=0.1259, beats_loss=0.01057, ecapa_loss=0.000158, whisper_loss=0.1137, over 21905.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01083, ecapa_loss=0.0001718, whisper_loss=0.09284, over 3906781.83 frames. ], batch size: 84, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:15:48,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1853680.0, ans=0.95 2024-08-12 22:15:58,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1853780.0, ans=0.125 2024-08-12 22:16:07,213 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 22:16:41,177 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11500, loss[loss=0.1128, beats_loss=0.01084, ecapa_loss=0.0001799, whisper_loss=0.1001, over 22251.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01078, ecapa_loss=0.0001722, whisper_loss=0.09269, over 3894487.03 frames. ], batch size: 89, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:16:41,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1854080.0, ans=0.0 2024-08-12 22:16:41,969 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2024-08-12 22:16:54,287 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 22:16:54,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1854080.0, ans=0.1 2024-08-12 22:17:15,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1854280.0, ans=0.125 2024-08-12 22:17:17,290 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.445e+01 2.643e+01 2.952e+01 4.086e+01, threshold=5.286e+01, percent-clipped=0.0 2024-08-12 22:17:22,349 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 38 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-12 22:17:30,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1854380.0, ans=0.125 2024-08-12 22:17:36,393 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 22:17:38,254 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-12 22:17:39,884 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 22:17:41,993 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2024-08-12 22:17:44,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1854480.0, ans=0.0 2024-08-12 22:17:51,265 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 22:17:52,934 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 38 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 22:17:56,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1854480.0, ans=0.0 2024-08-12 22:18:03,960 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11550, loss[loss=0.08144, beats_loss=0.01149, ecapa_loss=0.0001624, whisper_loss=0.06833, over 15497.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01074, ecapa_loss=0.0001717, whisper_loss=0.09299, over 3904728.17 frames. ], batch size: 61, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:18:04,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1854580.0, ans=0.0 2024-08-12 22:18:15,808 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.01 vs. limit=22.5 2024-08-12 22:18:18,836 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 22:18:29,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1854680.0, ans=0.5 2024-08-12 22:18:37,901 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-12 22:18:39,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1854780.0, ans=0.1 2024-08-12 22:18:42,833 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 22:19:04,165 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-12 22:19:11,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1854980.0, ans=0.5 2024-08-12 22:19:14,050 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 22:19:18,015 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2024-08-12 22:19:24,471 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11600, loss[loss=0.109, beats_loss=0.01025, ecapa_loss=0.0002155, whisper_loss=0.09656, over 20035.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01069, ecapa_loss=0.0001719, whisper_loss=0.09307, over 3885397.49 frames. ], batch size: 87, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:19:35,053 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.127e-02 2024-08-12 22:20:00,205 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.514e+01 2.737e+01 3.107e+01 4.746e+01, threshold=5.475e+01, percent-clipped=0.0 2024-08-12 22:20:05,183 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=12.0 2024-08-12 22:20:07,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1855280.0, ans=0.1 2024-08-12 22:20:15,097 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 22:20:23,878 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-12 22:20:33,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1855480.0, ans=0.0 2024-08-12 22:20:35,483 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 22:20:43,068 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11650, loss[loss=0.09365, beats_loss=0.01204, ecapa_loss=0.0001672, whisper_loss=0.07994, over 22402.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.0107, ecapa_loss=0.0001717, whisper_loss=0.093, over 3903138.25 frames. ], batch size: 92, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:20:50,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1855580.0, ans=0.015 2024-08-12 22:20:54,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1855580.0, ans=0.0 2024-08-12 22:20:56,389 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 22:21:08,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1855680.0, ans=0.0 2024-08-12 22:21:17,042 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 22:21:23,619 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 22:21:27,000 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 22:21:28,275 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 22:21:31,152 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2024-08-12 22:21:43,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1855880.0, ans=0.1 2024-08-12 22:21:46,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1855980.0, ans=0.1 2024-08-12 22:21:53,332 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2024-08-12 22:22:03,158 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11700, loss[loss=0.0744, beats_loss=0.01014, ecapa_loss=0.0001843, whisper_loss=0.06243, over 16362.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01088, ecapa_loss=0.000171, whisper_loss=0.09171, over 3896525.94 frames. ], batch size: 67, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:22:03,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1856080.0, ans=0.125 2024-08-12 22:22:20,543 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-12 22:22:27,819 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 29 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-12 22:22:31,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1856180.0, ans=0.2 2024-08-12 22:22:36,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1856280.0, ans=0.0 2024-08-12 22:22:39,450 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.485e+01 2.712e+01 3.027e+01 7.497e+01, threshold=5.424e+01, percent-clipped=1.0 2024-08-12 22:22:56,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1856380.0, ans=0.0 2024-08-12 22:23:01,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1856380.0, ans=0.0 2024-08-12 22:23:05,670 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 22:23:13,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1856480.0, ans=0.0 2024-08-12 22:23:14,192 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.74 vs. limit=15.0 2024-08-12 22:23:14,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1856480.0, ans=0.125 2024-08-12 22:23:23,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1856480.0, ans=0.125 2024-08-12 22:23:27,311 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11750, loss[loss=0.08159, beats_loss=0.01401, ecapa_loss=0.0001517, whisper_loss=0.06606, over 19014.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01089, ecapa_loss=0.0001714, whisper_loss=0.09213, over 3908819.33 frames. ], batch size: 79, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:23:45,549 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.35 vs. limit=12.0 2024-08-12 22:23:51,627 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 22:23:54,607 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-12 22:24:27,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1856880.0, ans=0.125 2024-08-12 22:24:34,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1856980.0, ans=0.1 2024-08-12 22:24:42,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1856980.0, ans=0.0 2024-08-12 22:24:44,514 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-12 22:24:45,738 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11800, loss[loss=0.09869, beats_loss=0.007898, ecapa_loss=0.0001923, whisper_loss=0.08887, over 15590.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01086, ecapa_loss=0.0001709, whisper_loss=0.09245, over 3889537.81 frames. ], batch size: 58, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:24:56,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1857080.0, ans=0.0 2024-08-12 22:25:21,321 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.559e+01 2.833e+01 3.342e+01 5.764e+01, threshold=5.666e+01, percent-clipped=1.0 2024-08-12 22:25:32,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1857280.0, ans=0.125 2024-08-12 22:25:36,036 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.608e-02 2024-08-12 22:25:37,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1857380.0, ans=0.125 2024-08-12 22:25:45,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1857380.0, ans=0.125 2024-08-12 22:25:47,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1857380.0, ans=0.1 2024-08-12 22:25:48,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=1857380.0, ans=10.0 2024-08-12 22:25:50,780 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 22:25:51,320 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=12.0 2024-08-12 22:26:06,702 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11850, loss[loss=0.1037, beats_loss=0.0106, ecapa_loss=0.0001617, whisper_loss=0.09148, over 24017.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01084, ecapa_loss=0.0001702, whisper_loss=0.09285, over 3880548.78 frames. ], batch size: 94, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:26:21,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1857680.0, ans=0.1 2024-08-12 22:26:29,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1857680.0, ans=0.125 2024-08-12 22:26:36,727 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 22:26:50,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1857780.0, ans=0.0 2024-08-12 22:27:22,089 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11900, loss[loss=0.09158, beats_loss=0.01303, ecapa_loss=0.0001757, whisper_loss=0.07679, over 21430.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01093, ecapa_loss=0.000171, whisper_loss=0.09239, over 3914350.32 frames. ], batch size: 92, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:27:34,872 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 11 from Vox, 42 fro AS 2024-08-12 22:27:41,559 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2024-08-12 22:27:52,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1858280.0, ans=0.2 2024-08-12 22:27:52,925 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.540e+01 2.783e+01 3.070e+01 4.680e+01, threshold=5.566e+01, percent-clipped=0.0 2024-08-12 22:28:00,263 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 22:28:26,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1858480.0, ans=0.2 2024-08-12 22:28:31,947 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 11950, loss[loss=0.09962, beats_loss=0.01187, ecapa_loss=0.0001329, whisper_loss=0.08642, over 20001.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0109, ecapa_loss=0.0001703, whisper_loss=0.09196, over 3869316.12 frames. ], batch size: 79, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:28:32,268 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 9 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 22:28:37,917 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 34 from Vox, 29 fro AS 2024-08-12 22:28:38,392 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2024-08-12 22:28:54,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1858680.0, ans=0.125 2024-08-12 22:28:56,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1858680.0, ans=0.0 2024-08-12 22:29:00,491 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 19 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-12 22:29:00,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1858780.0, ans=0.125 2024-08-12 22:29:11,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1858880.0, ans=0.125 2024-08-12 22:29:17,787 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 32 from Vox, 27 fro AS 2024-08-12 22:29:30,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1858980.0, ans=0.125 2024-08-12 22:29:36,843 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 22:29:37,806 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.28 vs. limit=15.0 2024-08-12 22:29:39,570 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12000, loss[loss=0.07932, beats_loss=0.01209, ecapa_loss=0.0001942, whisper_loss=0.06529, over 21286.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01093, ecapa_loss=0.0001705, whisper_loss=0.09092, over 3868560.80 frames. ], batch size: 92, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:29:39,571 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 22:30:19,812 INFO [train_multi_KD3.py:1149] (2/4) Epoch 13, validation on ASR_libri: loss=0.2562, beats_loss=0, ecapa_loss=0.0005805, whisper_loss=0.2504, over 922467.00 frames. 2024-08-12 22:30:37,885 INFO [train_multi_KD3.py:1149] (2/4) Epoch 13, validation on SV_voxceleb1: loss=0.004691, beats_loss=0, ecapa_loss=0.0004691, whisper_loss=0, over 939242.00 frames. 2024-08-12 22:32:33,588 INFO [train_multi_KD3.py:1149] (2/4) Epoch 13, validation on AT_audioset: loss=0.02411, beats_loss=0.02411, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 22:32:33,592 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 22:32:48,043 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 22:32:49,713 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 22:33:04,964 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.535e+01 2.857e+01 3.270e+01 5.667e+01, threshold=5.714e+01, percent-clipped=0.0 2024-08-12 22:33:22,099 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-12 22:33:37,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1859480.0, ans=0.5 2024-08-12 22:33:38,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1859480.0, ans=0.0 2024-08-12 22:33:40,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1859480.0, ans=0.125 2024-08-12 22:33:42,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1859480.0, ans=0.125 2024-08-12 22:33:45,295 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-12 22:33:46,417 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12050, loss[loss=0.09576, beats_loss=0.01365, ecapa_loss=0.0001179, whisper_loss=0.08092, over 20278.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01085, ecapa_loss=0.0001694, whisper_loss=0.09152, over 3854217.53 frames. ], batch size: 77, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:33:49,911 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 22:34:02,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1859680.0, ans=0.125 2024-08-12 22:34:16,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1859780.0, ans=0.0 2024-08-12 22:34:38,093 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 15 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 22:34:38,695 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=15.0 2024-08-12 22:34:58,253 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12100, loss[loss=0.08623, beats_loss=0.01058, ecapa_loss=0.0002002, whisper_loss=0.07365, over 18269.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01084, ecapa_loss=0.0001706, whisper_loss=0.09125, over 3839583.77 frames. ], batch size: 78, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:35:03,834 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 39 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-12 22:35:27,874 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.529e+01 2.799e+01 3.028e+01 6.026e+01, threshold=5.598e+01, percent-clipped=1.0 2024-08-12 22:35:48,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1860380.0, ans=0.125 2024-08-12 22:35:50,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1860380.0, ans=0.125 2024-08-12 22:36:02,681 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 22:36:08,250 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12150, loss[loss=0.1085, beats_loss=0.0107, ecapa_loss=0.0002024, whisper_loss=0.09573, over 19883.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0108, ecapa_loss=0.0001719, whisper_loss=0.09167, over 3846496.18 frames. ], batch size: 86, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:36:29,243 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 22:36:34,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1860780.0, ans=0.125 2024-08-12 22:36:37,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1860780.0, ans=0.125 2024-08-12 22:36:47,983 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.08 vs. limit=15.0 2024-08-12 22:36:49,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1860880.0, ans=0.125 2024-08-12 22:37:00,843 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 21 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 22:37:03,394 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 22:37:14,713 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 22:37:18,666 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12200, loss[loss=0.1136, beats_loss=0.008617, ecapa_loss=0.0001737, whisper_loss=0.1033, over 15723.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01079, ecapa_loss=0.0001724, whisper_loss=0.09169, over 3841737.51 frames. ], batch size: 61, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:37:19,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1861080.0, ans=0.1 2024-08-12 22:37:20,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1861080.0, ans=0.2 2024-08-12 22:37:46,452 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2024-08-12 22:37:49,467 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.525e+01 2.741e+01 3.168e+01 5.471e+01, threshold=5.482e+01, percent-clipped=0.0 2024-08-12 22:37:57,556 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=12.0 2024-08-12 22:38:01,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1861380.0, ans=0.0 2024-08-12 22:38:29,550 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12250, loss[loss=0.07414, beats_loss=0.01092, ecapa_loss=0.0001739, whisper_loss=0.06148, over 13402.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01075, ecapa_loss=0.0001721, whisper_loss=0.09171, over 3810929.00 frames. ], batch size: 55, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:38:49,790 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 22:39:05,246 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.31 vs. limit=22.5 2024-08-12 22:39:35,557 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 22:39:38,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1861980.0, ans=0.125 2024-08-12 22:39:38,879 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.89 vs. limit=22.5 2024-08-12 22:39:39,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1862080.0, ans=0.125 2024-08-12 22:39:41,011 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12300, loss[loss=0.1005, beats_loss=0.01048, ecapa_loss=0.0001449, whisper_loss=0.0886, over 14845.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01077, ecapa_loss=0.0001717, whisper_loss=0.09147, over 3818024.01 frames. ], batch size: 58, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:39:46,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1862080.0, ans=0.0 2024-08-12 22:39:48,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1862080.0, ans=0.125 2024-08-12 22:39:54,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1862180.0, ans=0.2 2024-08-12 22:39:57,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1862180.0, ans=0.0 2024-08-12 22:40:06,111 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 22:40:09,633 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-12 22:40:11,051 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.506e+01 2.717e+01 3.049e+01 5.234e+01, threshold=5.434e+01, percent-clipped=0.0 2024-08-12 22:40:34,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1862480.0, ans=0.125 2024-08-12 22:40:48,577 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12350, loss[loss=0.08373, beats_loss=0.01114, ecapa_loss=0.000176, whisper_loss=0.07084, over 13531.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01085, ecapa_loss=0.0001729, whisper_loss=0.09183, over 3858825.83 frames. ], batch size: 53, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:40:57,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1862580.0, ans=0.0 2024-08-12 22:41:02,895 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 22:41:14,130 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 22:41:14,918 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.70 vs. limit=10.0 2024-08-12 22:41:26,626 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 22:41:44,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=1862980.0, ans=15.0 2024-08-12 22:41:53,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1862980.0, ans=0.2 2024-08-12 22:41:53,375 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.79 vs. limit=15.0 2024-08-12 22:41:59,256 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12400, loss[loss=0.1232, beats_loss=0.01097, ecapa_loss=0.0001431, whisper_loss=0.1108, over 24527.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01085, ecapa_loss=0.0001726, whisper_loss=0.09213, over 3897371.39 frames. ], batch size: 92, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:42:29,654 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.619e+01 2.853e+01 3.347e+01 1.216e+02, threshold=5.705e+01, percent-clipped=2.0 2024-08-12 22:42:30,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1863280.0, ans=0.125 2024-08-12 22:42:31,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1863280.0, ans=0.0 2024-08-12 22:42:50,661 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 22:43:09,012 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12450, loss[loss=0.08916, beats_loss=0.01127, ecapa_loss=0.0001984, whisper_loss=0.07591, over 21193.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01083, ecapa_loss=0.000173, whisper_loss=0.09196, over 3887520.27 frames. ], batch size: 90, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:43:12,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1863580.0, ans=0.07 2024-08-12 22:43:15,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1863580.0, ans=0.0 2024-08-12 22:43:15,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1863580.0, ans=0.0 2024-08-12 22:43:20,060 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.82 vs. limit=10.0 2024-08-12 22:43:24,170 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 29 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 22:43:42,338 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 22:43:42,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1863780.0, ans=0.0 2024-08-12 22:44:19,525 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12500, loss[loss=0.09177, beats_loss=0.0131, ecapa_loss=0.0001621, whisper_loss=0.07705, over 17099.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01086, ecapa_loss=0.0001728, whisper_loss=0.09131, over 3856964.87 frames. ], batch size: 70, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:44:21,386 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 22:44:22,482 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 22:44:32,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1864180.0, ans=0.2 2024-08-12 22:44:32,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1864180.0, ans=0.0 2024-08-12 22:44:34,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1864180.0, ans=0.125 2024-08-12 22:44:41,497 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-12 22:44:45,620 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 22:44:49,301 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.439e+01 2.730e+01 3.074e+01 7.978e+01, threshold=5.460e+01, percent-clipped=1.0 2024-08-12 22:44:56,840 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 22:44:57,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=1864280.0, ans=0.2 2024-08-12 22:45:09,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1864380.0, ans=0.125 2024-08-12 22:45:21,317 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 22:45:22,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1864480.0, ans=0.1 2024-08-12 22:45:24,040 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 22:45:26,510 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12550, loss[loss=0.1154, beats_loss=0.01134, ecapa_loss=0.0001507, whisper_loss=0.1025, over 17704.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01093, ecapa_loss=0.0001728, whisper_loss=0.09076, over 3871411.49 frames. ], batch size: 67, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:45:28,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1864580.0, ans=0.2 2024-08-12 22:45:41,623 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-12 22:45:43,946 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2024-08-12 22:45:44,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1864680.0, ans=0.125 2024-08-12 22:45:51,461 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 22:45:59,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1864780.0, ans=0.1 2024-08-12 22:46:04,131 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 22:46:06,244 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.57 vs. limit=15.0 2024-08-12 22:46:16,087 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 22:46:20,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1864980.0, ans=0.0 2024-08-12 22:46:21,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1864980.0, ans=0.125 2024-08-12 22:46:33,518 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12600, loss[loss=0.1164, beats_loss=0.009634, ecapa_loss=0.0001587, whisper_loss=0.1052, over 22569.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01088, ecapa_loss=0.0001718, whisper_loss=0.09165, over 3880718.13 frames. ], batch size: 88, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:47:03,535 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.534e+01 2.817e+01 3.269e+01 5.497e+01, threshold=5.633e+01, percent-clipped=1.0 2024-08-12 22:47:21,456 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 22:47:21,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1865380.0, ans=0.125 2024-08-12 22:47:21,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1865380.0, ans=0.1 2024-08-12 22:47:27,036 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 22:47:30,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1865480.0, ans=0.0 2024-08-12 22:47:31,427 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.47 vs. limit=22.5 2024-08-12 22:47:32,859 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.58 vs. limit=10.0 2024-08-12 22:47:39,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1865480.0, ans=0.125 2024-08-12 22:47:42,023 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12650, loss[loss=0.1256, beats_loss=0.009306, ecapa_loss=0.000152, whisper_loss=0.1147, over 23386.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01093, ecapa_loss=0.0001722, whisper_loss=0.0914, over 3865896.14 frames. ], batch size: 89, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:47:49,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1865580.0, ans=0.125 2024-08-12 22:47:59,923 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-12 22:48:01,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1865680.0, ans=0.0 2024-08-12 22:48:09,982 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 22:48:12,615 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 22:48:37,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1865980.0, ans=0.2 2024-08-12 22:48:50,301 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12700, loss[loss=0.1101, beats_loss=0.01086, ecapa_loss=0.0001762, whisper_loss=0.09751, over 20734.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01089, ecapa_loss=0.0001726, whisper_loss=0.09134, over 3846079.83 frames. ], batch size: 84, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:49:13,555 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-12 22:49:21,588 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.169e+01 2.468e+01 2.692e+01 3.051e+01 4.394e+01, threshold=5.384e+01, percent-clipped=0.0 2024-08-12 22:49:30,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1866280.0, ans=0.0 2024-08-12 22:49:38,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1866380.0, ans=0.125 2024-08-12 22:49:59,654 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12750, loss[loss=0.08404, beats_loss=0.008352, ecapa_loss=0.000162, whisper_loss=0.07407, over 15119.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01085, ecapa_loss=0.0001733, whisper_loss=0.09195, over 3860036.54 frames. ], batch size: 57, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:50:06,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1866580.0, ans=0.125 2024-08-12 22:50:17,753 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 37 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 22:50:35,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1866780.0, ans=0.125 2024-08-12 22:50:57,931 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 22:51:05,698 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12800, loss[loss=0.1181, beats_loss=0.008716, ecapa_loss=0.0001817, whisper_loss=0.1076, over 21533.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01084, ecapa_loss=0.000173, whisper_loss=0.09216, over 3881370.58 frames. ], batch size: 84, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:51:11,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1867080.0, ans=0.125 2024-08-12 22:51:24,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2024-08-12 22:51:26,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2024-08-12 22:51:34,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1867280.0, ans=0.125 2024-08-12 22:51:35,464 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.415e+01 2.675e+01 2.893e+01 6.675e+01, threshold=5.350e+01, percent-clipped=1.0 2024-08-12 22:51:47,755 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 20 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 22:51:50,571 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 22:52:05,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1867480.0, ans=0.1 2024-08-12 22:52:12,143 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 22:52:13,120 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12850, loss[loss=0.1112, beats_loss=0.01052, ecapa_loss=0.0002106, whisper_loss=0.09857, over 21826.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01092, ecapa_loss=0.000173, whisper_loss=0.09083, over 3884838.05 frames. ], batch size: 89, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:52:32,304 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 22:52:39,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1867780.0, ans=0.2 2024-08-12 22:52:44,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1867780.0, ans=0.0 2024-08-12 22:52:45,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1867780.0, ans=0.125 2024-08-12 22:53:06,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1867980.0, ans=0.125 2024-08-12 22:53:15,236 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-12 22:53:18,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1867980.0, ans=0.0 2024-08-12 22:53:20,089 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 22:53:23,065 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12900, loss[loss=0.091, beats_loss=0.01445, ecapa_loss=0.0001758, whisper_loss=0.07479, over 21725.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01095, ecapa_loss=0.0001727, whisper_loss=0.09065, over 3894148.93 frames. ], batch size: 91, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:53:34,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1868080.0, ans=0.0 2024-08-12 22:53:40,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1868180.0, ans=0.2 2024-08-12 22:53:51,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1868280.0, ans=0.0 2024-08-12 22:53:53,312 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.434e+01 2.743e+01 3.168e+01 4.693e+01, threshold=5.486e+01, percent-clipped=0.0 2024-08-12 22:53:53,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1868280.0, ans=0.1 2024-08-12 22:54:05,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1868380.0, ans=0.125 2024-08-12 22:54:07,512 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2024-08-12 22:54:27,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1868480.0, ans=0.125 2024-08-12 22:54:32,666 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 12950, loss[loss=0.1006, beats_loss=0.0131, ecapa_loss=0.0001362, whisper_loss=0.08614, over 21807.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01096, ecapa_loss=0.0001717, whisper_loss=0.09042, over 3886838.92 frames. ], batch size: 89, lr: 4.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 22:54:32,812 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 22:54:42,437 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 31 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 22:54:50,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1868680.0, ans=0.125 2024-08-12 22:54:51,396 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 22:54:53,509 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-08-12 22:55:04,888 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 26 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 22:55:07,044 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=22.5 2024-08-12 22:55:25,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1868980.0, ans=0.0 2024-08-12 22:55:27,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1868980.0, ans=0.1 2024-08-12 22:55:30,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1868980.0, ans=0.125 2024-08-12 22:55:38,219 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2024-08-12 22:55:39,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1869080.0, ans=0.2 2024-08-12 22:55:40,006 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13000, loss[loss=0.09281, beats_loss=0.01296, ecapa_loss=0.0001851, whisper_loss=0.078, over 20885.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01093, ecapa_loss=0.0001716, whisper_loss=0.09105, over 3893391.45 frames. ], batch size: 90, lr: 4.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 22:55:47,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1869080.0, ans=0.125 2024-08-12 22:55:48,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1869080.0, ans=0.0 2024-08-12 22:56:09,166 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.487e+01 2.816e+01 3.426e+01 7.138e+01, threshold=5.633e+01, percent-clipped=2.0 2024-08-12 22:56:10,593 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 22:56:14,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1869280.0, ans=0.0 2024-08-12 22:56:26,472 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 22:56:29,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1869380.0, ans=0.0 2024-08-12 22:56:30,537 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-12 22:56:40,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1869480.0, ans=0.0 2024-08-12 22:56:46,940 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13050, loss[loss=0.08438, beats_loss=0.01332, ecapa_loss=0.0002172, whisper_loss=0.06888, over 16923.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0109, ecapa_loss=0.0001726, whisper_loss=0.0912, over 3888408.60 frames. ], batch size: 76, lr: 4.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 22:56:58,141 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 28 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 22:57:06,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1869680.0, ans=0.0 2024-08-12 22:57:06,858 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-12 22:57:12,918 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 24 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-12 22:57:16,139 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-08-12 22:57:17,061 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 22:57:17,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1869780.0, ans=0.0 2024-08-12 22:57:36,426 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 22:57:53,299 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13100, loss[loss=0.06895, beats_loss=0.01433, ecapa_loss=0.0001207, whisper_loss=0.05341, over 18196.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01099, ecapa_loss=0.0001712, whisper_loss=0.09077, over 3874452.90 frames. ], batch size: 74, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:57:53,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1870080.0, ans=0.0 2024-08-12 22:57:54,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1870080.0, ans=0.2 2024-08-12 22:58:05,189 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 22:58:08,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1870180.0, ans=0.1 2024-08-12 22:58:08,533 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2024-08-12 22:58:23,881 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.487e+01 2.739e+01 3.111e+01 4.282e+01, threshold=5.479e+01, percent-clipped=0.0 2024-08-12 22:58:29,309 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 22:58:39,034 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 22:58:45,098 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-12 22:59:00,267 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13150, loss[loss=0.09982, beats_loss=0.01155, ecapa_loss=0.0001565, whisper_loss=0.0867, over 16626.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01095, ecapa_loss=0.0001719, whisper_loss=0.09121, over 3871710.12 frames. ], batch size: 67, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:59:13,918 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 22:59:17,929 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 33 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 22:59:24,937 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 22:59:57,427 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 23:00:01,960 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 23:00:04,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1870980.0, ans=0.0 2024-08-12 23:00:06,657 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13200, loss[loss=0.08446, beats_loss=0.01233, ecapa_loss=0.0001689, whisper_loss=0.07044, over 18701.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01093, ecapa_loss=0.0001731, whisper_loss=0.09168, over 3903862.53 frames. ], batch size: 76, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:00:14,566 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 23:00:36,521 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.560e+01 2.764e+01 3.178e+01 9.126e+01, threshold=5.529e+01, percent-clipped=1.0 2024-08-12 23:00:52,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1871380.0, ans=0.0 2024-08-12 23:01:02,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1871480.0, ans=0.0 2024-08-12 23:01:04,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1871480.0, ans=0.0 2024-08-12 23:01:05,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1871480.0, ans=0.2 2024-08-12 23:01:11,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1871580.0, ans=0.125 2024-08-12 23:01:12,616 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13250, loss[loss=0.1098, beats_loss=0.01195, ecapa_loss=0.0001105, whisper_loss=0.09671, over 18808.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01089, ecapa_loss=0.0001734, whisper_loss=0.09144, over 3881742.88 frames. ], batch size: 69, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:01:14,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1871580.0, ans=0.0 2024-08-12 23:01:19,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1871580.0, ans=0.0 2024-08-12 23:01:26,850 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 23:01:36,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1871680.0, ans=0.125 2024-08-12 23:02:05,750 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.67 vs. limit=10.0 2024-08-12 23:02:11,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1871980.0, ans=0.0 2024-08-12 23:02:20,771 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13300, loss[loss=0.08648, beats_loss=0.01145, ecapa_loss=0.0002087, whisper_loss=0.07295, over 19721.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0109, ecapa_loss=0.0001727, whisper_loss=0.09181, over 3908307.09 frames. ], batch size: 85, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:02:25,528 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 23:02:25,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1872080.0, ans=0.1 2024-08-12 23:02:52,184 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.491e+01 2.756e+01 2.982e+01 7.499e+01, threshold=5.512e+01, percent-clipped=1.0 2024-08-12 23:02:56,812 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 19 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 23:02:58,710 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.69 vs. limit=22.5 2024-08-12 23:02:59,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1872280.0, ans=0.125 2024-08-12 23:03:04,833 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-12 23:03:12,848 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-12 23:03:15,604 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 29 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-12 23:03:28,646 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13350, loss[loss=0.1262, beats_loss=0.006712, ecapa_loss=0.000146, whisper_loss=0.118, over 17338.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01091, ecapa_loss=0.0001715, whisper_loss=0.0921, over 3870597.83 frames. ], batch size: 61, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:03:35,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1872580.0, ans=0.0 2024-08-12 23:03:51,681 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 23:03:53,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1872680.0, ans=0.1 2024-08-12 23:03:55,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1872780.0, ans=0.025 2024-08-12 23:03:59,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1872780.0, ans=10.0 2024-08-12 23:04:02,979 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2024-08-12 23:04:07,716 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-12 23:04:10,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1872880.0, ans=0.125 2024-08-12 23:04:13,975 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 20 from LS+wenet, 33 from Vox, 37 fro AS 2024-08-12 23:04:25,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1872980.0, ans=0.125 2024-08-12 23:04:35,481 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13400, loss[loss=0.09457, beats_loss=0.01078, ecapa_loss=0.000168, whisper_loss=0.08211, over 16484.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01101, ecapa_loss=0.0001709, whisper_loss=0.09144, over 3879228.92 frames. ], batch size: 67, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:05:06,161 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.402e+01 2.808e+01 3.201e+01 5.167e+01, threshold=5.616e+01, percent-clipped=0.0 2024-08-12 23:05:16,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1873380.0, ans=0.125 2024-08-12 23:05:29,623 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.06 vs. limit=5.0 2024-08-12 23:05:39,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=1873480.0, ans=0.05 2024-08-12 23:05:41,450 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13450, loss[loss=0.1064, beats_loss=0.009394, ecapa_loss=0.0001941, whisper_loss=0.09502, over 22403.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01097, ecapa_loss=0.0001716, whisper_loss=0.09117, over 3882166.49 frames. ], batch size: 92, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:05:47,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1873580.0, ans=0.0 2024-08-12 23:05:52,372 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.39 vs. limit=15.0 2024-08-12 23:05:59,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1873680.0, ans=0.125 2024-08-12 23:06:00,912 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 23:06:16,838 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-12 23:06:18,142 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 23:06:30,062 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 23:06:41,831 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-12 23:06:47,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1874080.0, ans=0.1 2024-08-12 23:06:48,025 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13500, loss[loss=0.1053, beats_loss=0.01013, ecapa_loss=0.0002149, whisper_loss=0.09301, over 20675.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01095, ecapa_loss=0.0001719, whisper_loss=0.09163, over 3865717.42 frames. ], batch size: 88, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:06:48,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1874080.0, ans=0.1 2024-08-12 23:06:59,469 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 23:07:11,680 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 23:07:17,164 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 23:07:19,468 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.452e+01 2.723e+01 3.030e+01 4.696e+01, threshold=5.446e+01, percent-clipped=0.0 2024-08-12 23:07:28,151 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-08-12 23:07:30,210 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 23:07:30,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1874380.0, ans=0.0 2024-08-12 23:07:36,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1874380.0, ans=0.1 2024-08-12 23:07:37,063 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 23:07:41,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1874480.0, ans=0.0 2024-08-12 23:07:55,290 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13550, loss[loss=0.1105, beats_loss=0.01057, ecapa_loss=0.0001678, whisper_loss=0.0983, over 16129.00 frames. ], tot_loss[loss=0.104, beats_loss=0.011, ecapa_loss=0.0001705, whisper_loss=0.09134, over 3868898.31 frames. ], batch size: 62, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:08:21,193 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=15.0 2024-08-12 23:08:31,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1874780.0, ans=0.125 2024-08-12 23:08:34,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1874880.0, ans=0.05 2024-08-12 23:08:37,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1874880.0, ans=0.1 2024-08-12 23:08:42,050 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 23:08:42,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1874880.0, ans=0.1 2024-08-12 23:08:58,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1874980.0, ans=0.125 2024-08-12 23:09:02,088 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13600, loss[loss=0.1085, beats_loss=0.0115, ecapa_loss=0.0001437, whisper_loss=0.09559, over 22571.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.011, ecapa_loss=0.0001695, whisper_loss=0.09191, over 3898622.65 frames. ], batch size: 87, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:09:09,446 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.29 vs. limit=22.5 2024-08-12 23:09:10,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1875080.0, ans=0.125 2024-08-12 23:09:22,149 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 23:09:32,499 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.462e+01 2.883e+01 3.310e+01 7.463e+01, threshold=5.766e+01, percent-clipped=1.0 2024-08-12 23:09:32,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1875280.0, ans=0.1 2024-08-12 23:09:56,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1875480.0, ans=0.125 2024-08-12 23:10:05,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1875480.0, ans=0.125 2024-08-12 23:10:06,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1875580.0, ans=0.1 2024-08-12 23:10:07,321 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13650, loss[loss=0.08439, beats_loss=0.01133, ecapa_loss=0.0001684, whisper_loss=0.07137, over 15766.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01097, ecapa_loss=0.0001707, whisper_loss=0.09241, over 3902217.39 frames. ], batch size: 63, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:10:13,173 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-12 23:10:33,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1875780.0, ans=0.125 2024-08-12 23:10:35,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1875780.0, ans=0.125 2024-08-12 23:10:47,571 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 23:10:47,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1875880.0, ans=0.125 2024-08-12 23:10:50,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1875880.0, ans=0.2 2024-08-12 23:11:02,549 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 23:11:06,685 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 23:11:14,546 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13700, loss[loss=0.1178, beats_loss=0.01039, ecapa_loss=0.0001724, whisper_loss=0.1057, over 16518.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01101, ecapa_loss=0.0001699, whisper_loss=0.09204, over 3870743.48 frames. ], batch size: 65, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:11:31,087 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 23:11:37,941 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-12 23:11:38,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1876180.0, ans=0.09899494936611666 2024-08-12 23:11:44,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1876280.0, ans=0.0 2024-08-12 23:11:44,999 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.467e+01 2.777e+01 3.137e+01 6.258e+01, threshold=5.554e+01, percent-clipped=1.0 2024-08-12 23:11:46,525 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 23:12:11,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1876480.0, ans=0.125 2024-08-12 23:12:21,825 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13750, loss[loss=0.09486, beats_loss=0.01267, ecapa_loss=0.0001201, whisper_loss=0.08099, over 15059.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01096, ecapa_loss=0.0001706, whisper_loss=0.09182, over 3877568.06 frames. ], batch size: 56, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:12:23,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1876580.0, ans=0.1 2024-08-12 23:12:27,927 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 23:12:37,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1876680.0, ans=0.1 2024-08-12 23:12:57,590 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 23:12:58,659 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 23:12:59,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1876780.0, ans=0.125 2024-08-12 23:13:01,771 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 23:13:31,992 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13800, loss[loss=0.1075, beats_loss=0.01025, ecapa_loss=0.0001687, whisper_loss=0.09559, over 22437.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01093, ecapa_loss=0.0001692, whisper_loss=0.09179, over 3880026.00 frames. ], batch size: 89, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:13:42,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1877080.0, ans=0.2 2024-08-12 23:13:46,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1877180.0, ans=0.125 2024-08-12 23:14:06,714 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.452e+01 2.663e+01 3.049e+01 4.287e+01, threshold=5.326e+01, percent-clipped=0.0 2024-08-12 23:14:10,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1877280.0, ans=0.5 2024-08-12 23:14:47,592 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13850, loss[loss=0.1005, beats_loss=0.0102, ecapa_loss=0.0001879, whisper_loss=0.08844, over 20961.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01081, ecapa_loss=0.0001706, whisper_loss=0.09245, over 3882502.99 frames. ], batch size: 87, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:14:54,083 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 23:15:19,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1877780.0, ans=0.125 2024-08-12 23:15:31,168 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 18 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-12 23:15:35,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1877880.0, ans=0.035 2024-08-12 23:15:55,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1877980.0, ans=0.125 2024-08-12 23:16:00,133 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-12 23:16:03,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1878080.0, ans=0.07 2024-08-12 23:16:04,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13900, loss[loss=0.09002, beats_loss=0.01166, ecapa_loss=0.0001702, whisper_loss=0.07666, over 18991.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01084, ecapa_loss=0.0001703, whisper_loss=0.09257, over 3891522.20 frames. ], batch size: 76, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:16:18,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1878180.0, ans=0.0 2024-08-12 23:16:35,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1878280.0, ans=0.1 2024-08-12 23:16:38,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1878280.0, ans=0.125 2024-08-12 23:16:39,679 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.486e+01 2.775e+01 2.978e+01 4.704e+01, threshold=5.551e+01, percent-clipped=0.0 2024-08-12 23:16:45,877 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 23:16:47,184 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 23:16:54,635 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 23:16:59,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2024-08-12 23:17:19,965 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 13950, loss[loss=0.09063, beats_loss=0.008995, ecapa_loss=0.0001665, whisper_loss=0.07997, over 18971.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01079, ecapa_loss=0.0001705, whisper_loss=0.09263, over 3892424.97 frames. ], batch size: 74, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:17:36,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1878680.0, ans=0.125 2024-08-12 23:17:45,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1878680.0, ans=0.0 2024-08-12 23:18:00,745 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 23:18:02,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1878780.0, ans=0.125 2024-08-12 23:18:30,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=12.0 2024-08-12 23:18:35,367 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 14000, loss[loss=0.114, beats_loss=0.01021, ecapa_loss=0.0001722, whisper_loss=0.1021, over 19859.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.011, ecapa_loss=0.0001683, whisper_loss=0.09181, over 3900438.85 frames. ], batch size: 77, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:18:37,531 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.56 vs. limit=6.0 2024-08-12 23:18:44,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1879080.0, ans=0.2 2024-08-12 23:18:56,448 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-12 23:18:58,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1879180.0, ans=0.125 2024-08-12 23:18:58,658 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.68 vs. limit=10.0 2024-08-12 23:19:09,753 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.518e+01 2.898e+01 3.200e+01 5.053e+01, threshold=5.795e+01, percent-clipped=0.0 2024-08-12 23:19:17,802 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.093e-01 2024-08-12 23:19:19,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1879380.0, ans=0.0 2024-08-12 23:19:22,499 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 23:19:32,190 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-12 23:19:45,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1879480.0, ans=0.125 2024-08-12 23:19:49,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1879480.0, ans=0.0 2024-08-12 23:19:51,105 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2024-08-12 23:19:51,644 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 14050, loss[loss=0.1118, beats_loss=0.0121, ecapa_loss=0.0001295, whisper_loss=0.09839, over 19680.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01105, ecapa_loss=0.0001674, whisper_loss=0.09183, over 3891386.79 frames. ], batch size: 76, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:19:54,752 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 23:19:58,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1879580.0, ans=0.1 2024-08-12 23:20:07,456 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2024-08-12 23:20:19,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1879680.0, ans=0.125 2024-08-12 23:20:27,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1879780.0, ans=0.125 2024-08-12 23:20:28,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1879780.0, ans=0.125 2024-08-12 23:20:38,353 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.19 vs. limit=10.0 2024-08-12 23:20:42,190 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 23:21:08,975 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 14100, loss[loss=0.1093, beats_loss=0.01034, ecapa_loss=0.0001735, whisper_loss=0.09721, over 18344.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01098, ecapa_loss=0.0001673, whisper_loss=0.09194, over 3914808.56 frames. ], batch size: 75, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:21:14,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1880080.0, ans=0.95 2024-08-12 23:21:22,085 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-12 23:21:29,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1880180.0, ans=0.125 2024-08-12 23:21:44,145 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.021e+01 2.402e+01 2.759e+01 3.024e+01 5.678e+01, threshold=5.519e+01, percent-clipped=0.0 2024-08-12 23:21:49,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1880280.0, ans=0.1 2024-08-12 23:21:56,726 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 19 from LS+wenet, 31 from Vox, 41 fro AS 2024-08-12 23:22:27,144 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 14150, loss[loss=0.1067, beats_loss=0.01094, ecapa_loss=0.0001853, whisper_loss=0.09392, over 22487.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01098, ecapa_loss=0.0001674, whisper_loss=0.09206, over 3903996.47 frames. ], batch size: 90, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:22:41,485 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2024-08-12 23:22:54,418 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 23:23:11,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1880780.0, ans=0.0 2024-08-12 23:23:17,245 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.611e+05 2024-08-12 23:23:22,036 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 23:23:40,735 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 23:23:46,658 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 14200, loss[loss=0.08767, beats_loss=0.01402, ecapa_loss=0.0001347, whisper_loss=0.0723, over 17183.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01093, ecapa_loss=0.0001674, whisper_loss=0.09199, over 3902356.84 frames. ], batch size: 68, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:23:55,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2024-08-12 23:24:01,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1881180.0, ans=0.125 2024-08-12 23:24:15,768 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.31 vs. limit=15.0 2024-08-12 23:24:19,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1881280.0, ans=0.125 2024-08-12 23:24:24,465 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.554e+01 2.881e+01 3.378e+01 7.854e+01, threshold=5.762e+01, percent-clipped=3.0 2024-08-12 23:24:24,726 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 23:24:27,783 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 16 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-12 23:24:34,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1881380.0, ans=0.125 2024-08-12 23:24:48,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1881380.0, ans=0.125 2024-08-12 23:24:50,485 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 23:25:07,531 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 14250, loss[loss=0.1159, beats_loss=0.01099, ecapa_loss=0.0001428, whisper_loss=0.1035, over 18975.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0109, ecapa_loss=0.0001673, whisper_loss=0.09165, over 3868941.65 frames. ], batch size: 71, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:25:17,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1881580.0, ans=0.07 2024-08-12 23:25:22,483 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 23:25:25,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1881680.0, ans=0.125 2024-08-12 23:25:31,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1881680.0, ans=0.125 2024-08-12 23:25:47,519 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.81 vs. limit=15.0 2024-08-12 23:26:03,497 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-12 23:26:14,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1881980.0, ans=0.1 2024-08-12 23:26:23,084 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2024-08-12 23:26:24,037 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 14300, loss[loss=0.1121, beats_loss=0.01039, ecapa_loss=0.0001561, whisper_loss=0.1001, over 23641.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01093, ecapa_loss=0.0001674, whisper_loss=0.09086, over 3833383.25 frames. ], batch size: 94, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:26:29,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1882080.0, ans=0.0 2024-08-12 23:26:35,484 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.684e+01 2024-08-12 23:26:58,877 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.532e+01 2.791e+01 3.195e+01 4.924e+01, threshold=5.583e+01, percent-clipped=0.0 2024-08-12 23:27:07,641 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 23:27:22,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1882480.0, ans=0.125 2024-08-12 23:27:27,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1882480.0, ans=0.0 2024-08-12 23:27:28,538 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.68 vs. limit=12.0 2024-08-12 23:27:39,407 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 14350, loss[loss=0.1284, beats_loss=0.01055, ecapa_loss=0.0001318, whisper_loss=0.1166, over 23873.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01094, ecapa_loss=0.0001684, whisper_loss=0.0908, over 3845273.17 frames. ], batch size: 88, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:27:41,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1882580.0, ans=0.125 2024-08-12 23:27:45,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2024-08-12 23:27:54,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1882680.0, ans=0.0 2024-08-12 23:27:57,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1882680.0, ans=0.025 2024-08-12 23:28:07,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1882680.0, ans=0.125 2024-08-12 23:28:09,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=1882680.0, ans=22.5 2024-08-12 23:28:26,481 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-12 23:28:33,682 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.250e+00 2024-08-12 23:28:44,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1882980.0, ans=0.1 2024-08-12 23:28:58,887 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 14400, loss[loss=0.04115, beats_loss=0.01576, ecapa_loss=0.0001975, whisper_loss=0.02341, over 17379.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01092, ecapa_loss=0.0001686, whisper_loss=0.09126, over 3852343.60 frames. ], batch size: 78, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:29:04,814 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.74 vs. limit=10.0 2024-08-12 23:29:08,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1883080.0, ans=0.125 2024-08-12 23:29:20,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1883180.0, ans=0.1 2024-08-12 23:29:23,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1883180.0, ans=0.0 2024-08-12 23:29:27,519 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 23:29:29,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1883280.0, ans=0.0 2024-08-12 23:29:33,269 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.540e+01 2.866e+01 3.197e+01 2.206e+02, threshold=5.732e+01, percent-clipped=2.0 2024-08-12 23:29:41,015 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 23:29:48,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1883380.0, ans=0.2 2024-08-12 23:30:01,339 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.48 vs. limit=15.0 2024-08-12 23:30:08,575 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 23:30:11,902 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 23:30:14,233 INFO [train_multi_KD3.py:1116] (2/4) Epoch 13, batch 14450, loss[loss=0.103, beats_loss=0.01141, ecapa_loss=0.0001712, whisper_loss=0.0899, over 15950.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01086, ecapa_loss=0.0001686, whisper_loss=0.09117, over 3856344.80 frames. ], batch size: 63, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:30:19,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1883580.0, ans=0.1 2024-08-12 23:30:23,920 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 23:30:25,459 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 23:30:31,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2024-08-12 23:30:56,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1883780.0, ans=10.0 2024-08-12 23:31:09,713 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.81 vs. limit=22.5 2024-08-12 23:31:54,480 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 0, loss[loss=0.09953, beats_loss=0.008691, ecapa_loss=0.0002246, whisper_loss=0.0886, over 13333.00 frames. ], tot_loss[loss=0.09953, beats_loss=0.008691, ecapa_loss=0.0002246, whisper_loss=0.0886, over 13333.00 frames. ], batch size: 54, lr: 4.58e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:31:54,481 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-12 23:32:30,902 INFO [train_multi_KD3.py:1149] (2/4) Epoch 14, validation on ASR_libri: loss=0.2554, beats_loss=0, ecapa_loss=0.0005808, whisper_loss=0.2496, over 922467.00 frames. 2024-08-12 23:32:47,296 INFO [train_multi_KD3.py:1149] (2/4) Epoch 14, validation on SV_voxceleb1: loss=0.004647, beats_loss=0, ecapa_loss=0.0004647, whisper_loss=0, over 939242.00 frames. 2024-08-12 23:34:33,423 INFO [train_multi_KD3.py:1149] (2/4) Epoch 14, validation on AT_audioset: loss=0.02401, beats_loss=0.02401, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 23:34:33,426 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-12 23:34:38,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1883990.0, ans=0.04949747468305833 2024-08-12 23:34:50,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1883990.0, ans=0.125 2024-08-12 23:35:13,300 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2024-08-12 23:35:25,295 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=12.0 2024-08-12 23:35:52,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1884290.0, ans=0.125 2024-08-12 23:35:52,853 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.26 vs. limit=15.0 2024-08-12 23:35:53,340 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.601e+01 2.897e+01 3.214e+01 1.891e+02, threshold=5.795e+01, percent-clipped=1.0 2024-08-12 23:35:54,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1884290.0, ans=0.0 2024-08-12 23:36:10,477 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-12 23:36:35,580 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 50, loss[loss=0.113, beats_loss=0.006611, ecapa_loss=0.0002386, whisper_loss=0.104, over 20840.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01027, ecapa_loss=0.0001802, whisper_loss=0.08904, over 877445.70 frames. ], batch size: 84, lr: 4.58e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:37:13,524 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 23:37:34,587 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.96 vs. limit=10.0 2024-08-12 23:37:53,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1884690.0, ans=0.125 2024-08-12 23:38:03,539 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-12 23:38:10,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1884790.0, ans=0.125 2024-08-12 23:38:27,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1884890.0, ans=0.125 2024-08-12 23:38:41,857 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 23:38:43,352 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 100, loss[loss=0.1023, beats_loss=0.009793, ecapa_loss=0.0001668, whisper_loss=0.09085, over 18943.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01017, ecapa_loss=0.0001751, whisper_loss=0.08949, over 1534972.19 frames. ], batch size: 74, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:38:46,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1884990.0, ans=0.0 2024-08-12 23:39:14,960 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 38 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 23:39:18,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1885090.0, ans=0.07 2024-08-12 23:39:24,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1885090.0, ans=0.1 2024-08-12 23:39:34,962 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-12 23:39:47,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1885190.0, ans=0.0 2024-08-12 23:39:47,460 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.58 vs. limit=10.0 2024-08-12 23:39:50,762 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 41 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 23:39:54,189 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 23:40:03,294 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=12.0 2024-08-12 23:40:11,080 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 23:40:24,784 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+01 2.825e+01 3.064e+01 3.241e+01 4.540e+01, threshold=6.128e+01, percent-clipped=0.0 2024-08-12 23:41:11,558 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 150, loss[loss=0.09088, beats_loss=0.01121, ecapa_loss=0.0001529, whisper_loss=0.07814, over 16128.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01023, ecapa_loss=0.0001708, whisper_loss=0.09101, over 2076524.91 frames. ], batch size: 63, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:41:16,485 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.54 vs. limit=22.5 2024-08-12 23:41:56,246 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.82 vs. limit=10.0 2024-08-12 23:42:08,663 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-12 23:42:13,966 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 23:42:20,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1885690.0, ans=0.125 2024-08-12 23:42:21,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1885690.0, ans=0.125 2024-08-12 23:43:16,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1885890.0, ans=0.2 2024-08-12 23:43:23,175 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 200, loss[loss=0.1174, beats_loss=0.009501, ecapa_loss=0.0001607, whisper_loss=0.1063, over 19106.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01034, ecapa_loss=0.0001703, whisper_loss=0.09135, over 2454633.42 frames. ], batch size: 73, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:43:57,746 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 23:44:22,430 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.61 vs. limit=10.0 2024-08-12 23:44:24,061 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 23:44:40,258 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.587e+01 2.870e+01 3.355e+01 1.552e+02, threshold=5.741e+01, percent-clipped=1.0 2024-08-12 23:44:58,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1886290.0, ans=0.125 2024-08-12 23:45:03,280 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 23:45:26,538 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 250, loss[loss=0.1238, beats_loss=0.006729, ecapa_loss=0.0002028, whisper_loss=0.1151, over 15768.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01034, ecapa_loss=0.0001704, whisper_loss=0.09123, over 2734219.69 frames. ], batch size: 59, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:45:37,565 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 23:45:53,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1886590.0, ans=0.0 2024-08-12 23:46:18,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1886690.0, ans=0.125 2024-08-12 23:46:21,260 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 23:46:37,222 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 16 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 23:47:14,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1886890.0, ans=0.07 2024-08-12 23:47:27,398 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 300, loss[loss=0.1217, beats_loss=0.01187, ecapa_loss=0.0001439, whisper_loss=0.1084, over 23214.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01045, ecapa_loss=0.0001712, whisper_loss=0.09164, over 2957702.06 frames. ], batch size: 90, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:47:57,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1887190.0, ans=0.125 2024-08-12 23:48:07,527 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 23:48:12,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1887290.0, ans=0.0 2024-08-12 23:48:16,133 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.332e+01 2.692e+01 3.047e+01 7.964e+01, threshold=5.385e+01, percent-clipped=1.0 2024-08-12 23:48:16,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1887290.0, ans=0.125 2024-08-12 23:48:42,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1887390.0, ans=0.125 2024-08-12 23:48:44,167 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 350, loss[loss=0.09746, beats_loss=0.0133, ecapa_loss=0.0001567, whisper_loss=0.0826, over 23231.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01049, ecapa_loss=0.0001685, whisper_loss=0.09124, over 3147213.22 frames. ], batch size: 94, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:49:03,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1887590.0, ans=0.125 2024-08-12 23:49:07,981 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.33 vs. limit=15.0 2024-08-12 23:49:16,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1887690.0, ans=0.2 2024-08-12 23:49:16,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1887690.0, ans=0.125 2024-08-12 23:49:17,108 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.93 vs. limit=12.0 2024-08-12 23:49:21,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1887690.0, ans=0.125 2024-08-12 23:49:27,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1887690.0, ans=0.0 2024-08-12 23:49:59,319 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 400, loss[loss=0.1081, beats_loss=0.008133, ecapa_loss=0.0002038, whisper_loss=0.09792, over 22260.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01054, ecapa_loss=0.0001671, whisper_loss=0.09119, over 3281151.72 frames. ], batch size: 89, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:50:47,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1888290.0, ans=0.1 2024-08-12 23:50:51,608 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.354e+01 2.624e+01 3.158e+01 4.755e+01, threshold=5.248e+01, percent-clipped=0.0 2024-08-12 23:51:15,450 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.08 vs. limit=22.5 2024-08-12 23:51:17,137 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 450, loss[loss=0.1029, beats_loss=0.01203, ecapa_loss=0.00015, whisper_loss=0.08936, over 21004.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01054, ecapa_loss=0.0001681, whisper_loss=0.09164, over 3443676.54 frames. ], batch size: 82, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:52:00,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1888690.0, ans=0.07 2024-08-12 23:52:03,506 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-08-12 23:52:05,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1888790.0, ans=0.015 2024-08-12 23:52:12,785 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 23:52:13,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1888790.0, ans=0.125 2024-08-12 23:52:17,098 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 23:52:26,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1888890.0, ans=0.0 2024-08-12 23:52:33,521 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 500, loss[loss=0.1121, beats_loss=0.007458, ecapa_loss=0.0002001, whisper_loss=0.1026, over 16163.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01057, ecapa_loss=0.0001691, whisper_loss=0.09116, over 3534531.76 frames. ], batch size: 62, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:52:39,762 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 23:52:41,340 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-12 23:52:49,352 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 23:52:56,959 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 23:53:10,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1889190.0, ans=0.2 2024-08-12 23:53:23,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1889290.0, ans=0.125 2024-08-12 23:53:24,135 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.385e+01 2.695e+01 3.088e+01 5.680e+01, threshold=5.390e+01, percent-clipped=1.0 2024-08-12 23:53:37,529 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=12.0 2024-08-12 23:53:45,819 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.29 vs. limit=15.0 2024-08-12 23:53:51,193 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 550, loss[loss=0.1116, beats_loss=0.01036, ecapa_loss=0.0001883, whisper_loss=0.09933, over 20549.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01061, ecapa_loss=0.0001678, whisper_loss=0.09095, over 3613348.71 frames. ], batch size: 83, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:53:57,987 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-12 23:54:01,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1889490.0, ans=0.125 2024-08-12 23:54:02,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1889490.0, ans=0.0 2024-08-12 23:54:07,283 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.84 vs. limit=10.0 2024-08-12 23:54:45,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1889790.0, ans=0.125 2024-08-12 23:54:47,851 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-08-12 23:54:52,138 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.80 vs. limit=10.0 2024-08-12 23:54:59,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1889890.0, ans=0.125 2024-08-12 23:55:06,852 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 600, loss[loss=0.1012, beats_loss=0.01204, ecapa_loss=0.0002094, whisper_loss=0.08703, over 21375.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001675, whisper_loss=0.09077, over 3653693.74 frames. ], batch size: 88, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:55:08,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1889990.0, ans=0.0 2024-08-12 23:55:12,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1889990.0, ans=0.125 2024-08-12 23:55:31,634 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2024-08-12 23:55:37,908 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 23:55:42,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1890190.0, ans=0.125 2024-08-12 23:55:44,912 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-12 23:55:46,243 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-12 23:55:47,272 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-08-12 23:55:54,898 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.595e+01 2.472e+01 2.658e+01 3.015e+01 7.457e+01, threshold=5.315e+01, percent-clipped=1.0 2024-08-12 23:55:58,323 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-12 23:56:20,244 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 650, loss[loss=0.1129, beats_loss=0.008828, ecapa_loss=0.0001624, whisper_loss=0.1025, over 19233.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001678, whisper_loss=0.09061, over 3671131.52 frames. ], batch size: 73, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:56:41,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1890590.0, ans=0.125 2024-08-12 23:56:42,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1890590.0, ans=0.125 2024-08-12 23:56:53,661 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 23:56:58,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1890690.0, ans=0.125 2024-08-12 23:57:15,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1890790.0, ans=0.125 2024-08-12 23:57:23,719 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-12 23:57:24,638 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.53 vs. limit=22.5 2024-08-12 23:57:36,247 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 700, loss[loss=0.07918, beats_loss=0.01201, ecapa_loss=0.0001969, whisper_loss=0.06519, over 13064.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01066, ecapa_loss=0.0001673, whisper_loss=0.09118, over 3723521.42 frames. ], batch size: 53, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:57:38,425 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.70 vs. limit=15.0 2024-08-12 23:57:42,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1890990.0, ans=0.0 2024-08-12 23:57:48,476 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.97 vs. limit=6.0 2024-08-12 23:58:01,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1891090.0, ans=0.0 2024-08-12 23:58:07,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1891190.0, ans=0.125 2024-08-12 23:58:09,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1891190.0, ans=0.125 2024-08-12 23:58:18,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2024-08-12 23:58:24,423 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.432e+01 2.727e+01 3.024e+01 4.665e+01, threshold=5.453e+01, percent-clipped=0.0 2024-08-12 23:58:25,940 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 32 from Vox, 26 fro AS 2024-08-12 23:58:28,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1891290.0, ans=0.0 2024-08-12 23:58:41,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1891390.0, ans=0.125 2024-08-12 23:58:46,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1891390.0, ans=0.125 2024-08-12 23:58:49,413 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 750, loss[loss=0.08234, beats_loss=0.01224, ecapa_loss=0.0001924, whisper_loss=0.06818, over 19815.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01074, ecapa_loss=0.0001665, whisper_loss=0.0907, over 3750906.29 frames. ], batch size: 87, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:58:49,601 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 23:58:54,591 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-12 23:58:58,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1891490.0, ans=0.0 2024-08-12 23:59:04,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1891590.0, ans=0.1 2024-08-12 23:59:08,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1891590.0, ans=0.125 2024-08-12 23:59:11,160 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 23:59:11,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1891590.0, ans=0.125 2024-08-12 23:59:31,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1891690.0, ans=0.0 2024-08-12 23:59:40,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.43 vs. limit=10.0 2024-08-12 23:59:56,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1891890.0, ans=0.125 2024-08-12 23:59:57,558 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 00:00:01,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1891890.0, ans=0.0 2024-08-13 00:00:03,814 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 800, loss[loss=0.13, beats_loss=0.009535, ecapa_loss=0.0001697, whisper_loss=0.1187, over 19728.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01076, ecapa_loss=0.0001668, whisper_loss=0.09088, over 3766953.88 frames. ], batch size: 76, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:00:54,436 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.057e+01 2.376e+01 2.556e+01 2.956e+01 7.880e+01, threshold=5.112e+01, percent-clipped=1.0 2024-08-13 00:01:00,936 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 11 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 00:01:07,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1892390.0, ans=0.0 2024-08-13 00:01:09,526 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-13 00:01:10,518 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.62 vs. limit=22.5 2024-08-13 00:01:19,979 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 850, loss[loss=0.1051, beats_loss=0.01081, ecapa_loss=0.0001346, whisper_loss=0.09291, over 23575.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01072, ecapa_loss=0.0001671, whisper_loss=0.09056, over 3766351.70 frames. ], batch size: 91, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:01:44,933 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.02 vs. limit=22.5 2024-08-13 00:01:59,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.67 vs. limit=15.0 2024-08-13 00:02:27,593 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 00:02:30,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1892890.0, ans=0.1 2024-08-13 00:02:31,995 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 900, loss[loss=0.0868, beats_loss=0.01203, ecapa_loss=0.0001268, whisper_loss=0.0735, over 16196.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01073, ecapa_loss=0.0001659, whisper_loss=0.09021, over 3760389.90 frames. ], batch size: 63, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:02:33,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1892990.0, ans=0.125 2024-08-13 00:03:03,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1893190.0, ans=0.1 2024-08-13 00:03:06,219 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 17 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 00:03:16,937 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 00:03:19,460 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.408e+01 2.662e+01 2.977e+01 4.425e+01, threshold=5.325e+01, percent-clipped=0.0 2024-08-13 00:03:25,308 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 00:03:25,639 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:03:28,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=1893390.0, ans=15.0 2024-08-13 00:03:29,392 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 00:03:38,995 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-13 00:03:39,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1893390.0, ans=0.125 2024-08-13 00:03:44,363 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 950, loss[loss=0.1071, beats_loss=0.008777, ecapa_loss=0.0001864, whisper_loss=0.09649, over 16165.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01074, ecapa_loss=0.0001641, whisper_loss=0.09031, over 3781199.37 frames. ], batch size: 63, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:03:51,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1893490.0, ans=0.125 2024-08-13 00:04:01,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1893590.0, ans=0.125 2024-08-13 00:04:19,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1893690.0, ans=0.125 2024-08-13 00:04:22,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1893690.0, ans=0.05 2024-08-13 00:04:26,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1893690.0, ans=0.1 2024-08-13 00:04:26,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1893690.0, ans=0.125 2024-08-13 00:04:34,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1893790.0, ans=0.125 2024-08-13 00:04:38,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1893790.0, ans=0.125 2024-08-13 00:04:41,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1893790.0, ans=0.0 2024-08-13 00:04:48,411 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.51 vs. limit=22.5 2024-08-13 00:04:59,875 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1000, loss[loss=0.09047, beats_loss=0.01387, ecapa_loss=0.0001568, whisper_loss=0.07504, over 17115.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01072, ecapa_loss=0.0001646, whisper_loss=0.09012, over 3764009.64 frames. ], batch size: 71, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:05:18,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1894090.0, ans=0.2 2024-08-13 00:05:19,505 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.735e+05 2024-08-13 00:05:29,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1894190.0, ans=0.1 2024-08-13 00:05:35,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1894190.0, ans=0.2 2024-08-13 00:05:48,068 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.405e+01 2.688e+01 3.061e+01 4.317e+01, threshold=5.377e+01, percent-clipped=0.0 2024-08-13 00:05:55,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1894290.0, ans=0.5 2024-08-13 00:06:10,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1894390.0, ans=0.1 2024-08-13 00:06:13,784 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1050, loss[loss=0.1184, beats_loss=0.009776, ecapa_loss=0.0001461, whisper_loss=0.1072, over 24064.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.0001636, whisper_loss=0.09057, over 3772648.63 frames. ], batch size: 92, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:06:15,266 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 34 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 00:06:31,009 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.06 vs. limit=15.0 2024-08-13 00:06:58,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1894690.0, ans=0.125 2024-08-13 00:07:06,463 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 00:07:14,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1894790.0, ans=0.125 2024-08-13 00:07:19,259 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 00:07:22,051 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-13 00:07:33,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1894990.0, ans=0.125 2024-08-13 00:07:34,266 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1100, loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001699, whisper_loss=0.09069, over 19614.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.0001643, whisper_loss=0.09062, over 3786052.57 frames. ], batch size: 79, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:07:39,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1894990.0, ans=0.07 2024-08-13 00:07:54,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1895090.0, ans=0.125 2024-08-13 00:07:54,451 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.70 vs. limit=10.0 2024-08-13 00:08:21,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1895290.0, ans=0.0 2024-08-13 00:08:25,046 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.502e+01 2.869e+01 3.346e+01 6.186e+01, threshold=5.739e+01, percent-clipped=2.0 2024-08-13 00:08:44,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1895390.0, ans=0.05 2024-08-13 00:08:51,181 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1150, loss[loss=0.08431, beats_loss=0.006493, ecapa_loss=0.0002545, whisper_loss=0.07527, over 17137.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01061, ecapa_loss=0.0001647, whisper_loss=0.09121, over 3784318.30 frames. ], batch size: 70, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:08:51,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1895490.0, ans=0.1 2024-08-13 00:08:59,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1895490.0, ans=0.0 2024-08-13 00:09:25,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1895690.0, ans=0.05 2024-08-13 00:09:31,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1895690.0, ans=10.0 2024-08-13 00:09:35,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1895690.0, ans=0.125 2024-08-13 00:09:40,821 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 43 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 00:09:41,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1895790.0, ans=0.1 2024-08-13 00:09:53,507 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 00:10:10,389 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1200, loss[loss=0.09125, beats_loss=0.01328, ecapa_loss=0.0001512, whisper_loss=0.07646, over 14761.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01068, ecapa_loss=0.0001643, whisper_loss=0.09108, over 3780571.77 frames. ], batch size: 59, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:10:11,901 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-13 00:11:06,012 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.344e+01 2.617e+01 3.051e+01 6.950e+01, threshold=5.235e+01, percent-clipped=1.0 2024-08-13 00:11:12,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1896290.0, ans=0.0 2024-08-13 00:11:27,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1896390.0, ans=0.125 2024-08-13 00:11:31,734 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1250, loss[loss=0.09571, beats_loss=0.01337, ecapa_loss=0.0001283, whisper_loss=0.08106, over 17138.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.0001639, whisper_loss=0.09089, over 3748940.50 frames. ], batch size: 66, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:12:07,227 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.39 vs. limit=12.0 2024-08-13 00:12:15,410 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 23 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-13 00:12:19,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1896790.0, ans=6.0 2024-08-13 00:12:19,428 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.78 vs. limit=12.0 2024-08-13 00:12:29,444 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 26 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-13 00:12:50,294 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1300, loss[loss=0.1012, beats_loss=0.009816, ecapa_loss=0.0001682, whisper_loss=0.08969, over 21183.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01073, ecapa_loss=0.0001641, whisper_loss=0.09153, over 3739703.05 frames. ], batch size: 84, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:12:50,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1896990.0, ans=0.0 2024-08-13 00:12:54,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1896990.0, ans=0.0 2024-08-13 00:12:55,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1896990.0, ans=0.125 2024-08-13 00:13:01,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1896990.0, ans=0.09899494936611666 2024-08-13 00:13:03,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1896990.0, ans=0.0 2024-08-13 00:13:13,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1897090.0, ans=0.2 2024-08-13 00:13:26,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1897190.0, ans=0.125 2024-08-13 00:13:27,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1897190.0, ans=0.125 2024-08-13 00:13:43,314 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.447e+01 2.732e+01 3.060e+01 1.003e+02, threshold=5.464e+01, percent-clipped=1.0 2024-08-13 00:13:54,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1897390.0, ans=0.0 2024-08-13 00:13:57,299 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-13 00:14:00,331 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2024-08-13 00:14:01,289 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2024-08-13 00:14:12,430 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1350, loss[loss=0.1004, beats_loss=0.01158, ecapa_loss=0.0001407, whisper_loss=0.08741, over 17313.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01075, ecapa_loss=0.0001638, whisper_loss=0.09109, over 3736725.63 frames. ], batch size: 67, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:14:21,524 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.04 vs. limit=12.0 2024-08-13 00:15:01,968 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.53 vs. limit=15.0 2024-08-13 00:15:05,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1897790.0, ans=0.125 2024-08-13 00:15:06,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1897790.0, ans=0.125 2024-08-13 00:15:14,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1897790.0, ans=0.0 2024-08-13 00:15:16,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1897890.0, ans=15.0 2024-08-13 00:15:32,920 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1400, loss[loss=0.08743, beats_loss=0.01125, ecapa_loss=0.0001772, whisper_loss=0.07441, over 22057.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01071, ecapa_loss=0.0001634, whisper_loss=0.091, over 3757983.24 frames. ], batch size: 91, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:15:56,578 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 00:16:18,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1898190.0, ans=0.125 2024-08-13 00:16:23,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1898290.0, ans=0.125 2024-08-13 00:16:25,594 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.414e+01 2.708e+01 3.137e+01 5.162e+01, threshold=5.416e+01, percent-clipped=0.0 2024-08-13 00:16:36,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1898390.0, ans=0.125 2024-08-13 00:16:38,970 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 00:16:44,117 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 00:16:54,083 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1450, loss[loss=0.1041, beats_loss=0.01241, ecapa_loss=0.0001609, whisper_loss=0.09008, over 23188.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01081, ecapa_loss=0.0001637, whisper_loss=0.08976, over 3792284.73 frames. ], batch size: 92, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:17:28,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1898490.0, ans=0.125 2024-08-13 00:17:34,984 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 00:17:41,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1898590.0, ans=0.1 2024-08-13 00:17:52,942 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.707e-03 2024-08-13 00:17:54,575 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.788e+05 2024-08-13 00:18:05,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1898690.0, ans=0.07 2024-08-13 00:18:12,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1898790.0, ans=0.2 2024-08-13 00:18:14,738 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-13 00:18:43,406 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1500, loss[loss=0.08877, beats_loss=0.01208, ecapa_loss=0.0001337, whisper_loss=0.07535, over 14824.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01092, ecapa_loss=0.0001619, whisper_loss=0.08886, over 3794286.14 frames. ], batch size: 58, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:18:51,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1898990.0, ans=0.125 2024-08-13 00:19:07,847 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 00:19:12,389 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.89 vs. limit=15.0 2024-08-13 00:19:14,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1899190.0, ans=0.2 2024-08-13 00:19:27,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1899190.0, ans=0.0 2024-08-13 00:19:35,149 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.892e+01 2.418e+01 2.688e+01 3.116e+01 4.487e+01, threshold=5.376e+01, percent-clipped=0.0 2024-08-13 00:19:45,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1899390.0, ans=0.1 2024-08-13 00:19:55,732 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 00:20:02,717 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1550, loss[loss=0.1209, beats_loss=0.008402, ecapa_loss=0.0001744, whisper_loss=0.1107, over 16354.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01078, ecapa_loss=0.0001629, whisper_loss=0.08999, over 3772573.40 frames. ], batch size: 64, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:20:04,606 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-13 00:20:10,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1899490.0, ans=0.125 2024-08-13 00:20:20,054 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=22.5 2024-08-13 00:20:30,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1899590.0, ans=0.2 2024-08-13 00:20:52,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1899790.0, ans=0.125 2024-08-13 00:21:05,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1899890.0, ans=0.125 2024-08-13 00:21:16,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1899890.0, ans=0.07 2024-08-13 00:21:20,767 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1600, loss[loss=0.08014, beats_loss=0.01108, ecapa_loss=0.0001991, whisper_loss=0.06706, over 20153.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01073, ecapa_loss=0.0001638, whisper_loss=0.09045, over 3779882.14 frames. ], batch size: 84, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:21:27,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1899990.0, ans=0.0 2024-08-13 00:21:58,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1900190.0, ans=0.0 2024-08-13 00:22:04,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1900190.0, ans=0.125 2024-08-13 00:22:12,560 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.583e+01 2.856e+01 3.340e+01 1.108e+02, threshold=5.712e+01, percent-clipped=2.0 2024-08-13 00:22:18,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1900290.0, ans=0.0 2024-08-13 00:22:38,235 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1650, loss[loss=0.1083, beats_loss=0.009091, ecapa_loss=0.0002583, whisper_loss=0.09667, over 17605.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01076, ecapa_loss=0.0001635, whisper_loss=0.09006, over 3802684.26 frames. ], batch size: 73, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:22:52,175 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:23:10,824 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-13 00:23:23,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1900790.0, ans=0.1 2024-08-13 00:23:32,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1900790.0, ans=0.2 2024-08-13 00:23:36,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1900890.0, ans=0.1 2024-08-13 00:23:49,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1900890.0, ans=0.0 2024-08-13 00:23:51,117 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.40 vs. limit=22.5 2024-08-13 00:23:53,368 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1700, loss[loss=0.09791, beats_loss=0.01132, ecapa_loss=0.0001343, whisper_loss=0.08525, over 19131.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001619, whisper_loss=0.09099, over 3810244.14 frames. ], batch size: 70, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:23:57,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1900990.0, ans=0.04949747468305833 2024-08-13 00:24:02,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1900990.0, ans=0.2 2024-08-13 00:24:10,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1901090.0, ans=0.04949747468305833 2024-08-13 00:24:14,022 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.26 vs. limit=6.0 2024-08-13 00:24:24,968 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 00:24:26,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1901190.0, ans=0.1 2024-08-13 00:24:30,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1901190.0, ans=0.125 2024-08-13 00:24:35,260 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 00:24:42,397 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.360e+01 2.688e+01 2.973e+01 4.042e+01, threshold=5.375e+01, percent-clipped=0.0 2024-08-13 00:24:51,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1901290.0, ans=0.2 2024-08-13 00:24:52,548 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.855e-01 2024-08-13 00:24:53,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1901390.0, ans=0.05 2024-08-13 00:25:03,749 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 00:25:07,772 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1750, loss[loss=0.09521, beats_loss=0.01233, ecapa_loss=0.0001564, whisper_loss=0.08132, over 23097.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01071, ecapa_loss=0.0001616, whisper_loss=0.09126, over 3843760.84 frames. ], batch size: 92, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:25:22,017 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-13 00:25:48,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1901690.0, ans=22.5 2024-08-13 00:26:00,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1901790.0, ans=0.125 2024-08-13 00:26:15,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1901890.0, ans=0.125 2024-08-13 00:26:20,505 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1800, loss[loss=0.1189, beats_loss=0.008722, ecapa_loss=0.0001466, whisper_loss=0.1087, over 20641.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01069, ecapa_loss=0.000162, whisper_loss=0.09145, over 3822112.45 frames. ], batch size: 80, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:26:39,129 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.609e-02 2024-08-13 00:26:41,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1902090.0, ans=0.2 2024-08-13 00:26:45,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1902090.0, ans=0.125 2024-08-13 00:27:03,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1902190.0, ans=0.2 2024-08-13 00:27:12,794 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.455e+01 2.703e+01 3.083e+01 4.143e+01, threshold=5.406e+01, percent-clipped=0.0 2024-08-13 00:27:21,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1902290.0, ans=0.1 2024-08-13 00:27:25,506 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 00:27:26,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1902390.0, ans=0.125 2024-08-13 00:27:35,194 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 00:27:36,505 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2024-08-13 00:27:40,220 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1850, loss[loss=0.1062, beats_loss=0.01056, ecapa_loss=0.0001698, whisper_loss=0.09389, over 18076.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0107, ecapa_loss=0.0001623, whisper_loss=0.09123, over 3845168.68 frames. ], batch size: 71, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:27:43,828 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 00:27:49,972 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 00:28:04,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1902590.0, ans=0.1 2024-08-13 00:28:05,204 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 33 from Vox, 30 fro AS 2024-08-13 00:28:37,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1902790.0, ans=0.125 2024-08-13 00:28:40,785 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.52 vs. limit=10.0 2024-08-13 00:28:44,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1902890.0, ans=0.1 2024-08-13 00:29:00,950 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1900, loss[loss=0.1083, beats_loss=0.01142, ecapa_loss=0.0001746, whisper_loss=0.09515, over 21870.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01074, ecapa_loss=0.0001635, whisper_loss=0.09149, over 3870853.03 frames. ], batch size: 89, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:29:09,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1902990.0, ans=0.0 2024-08-13 00:29:10,209 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 00:29:16,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1903090.0, ans=0.125 2024-08-13 00:29:28,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1903090.0, ans=0.125 2024-08-13 00:29:41,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1903190.0, ans=0.0 2024-08-13 00:29:53,713 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.471e+01 2.746e+01 3.040e+01 5.075e+01, threshold=5.492e+01, percent-clipped=0.0 2024-08-13 00:30:05,938 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 00:30:11,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1903390.0, ans=0.0 2024-08-13 00:30:11,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1903390.0, ans=0.09899494936611666 2024-08-13 00:30:19,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1903490.0, ans=0.2 2024-08-13 00:30:20,529 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 1950, loss[loss=0.105, beats_loss=0.01052, ecapa_loss=0.0001179, whisper_loss=0.09326, over 17318.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01074, ecapa_loss=0.000164, whisper_loss=0.09148, over 3830101.44 frames. ], batch size: 64, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:30:25,338 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 20 from LS+wenet, 25 from Vox, 49 fro AS 2024-08-13 00:30:44,187 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 14 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 00:31:07,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1903790.0, ans=0.1 2024-08-13 00:31:11,429 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:31:33,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1903890.0, ans=0.0 2024-08-13 00:31:39,086 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2000, loss[loss=0.1011, beats_loss=0.009261, ecapa_loss=0.0001904, whisper_loss=0.08997, over 18751.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01072, ecapa_loss=0.0001658, whisper_loss=0.09168, over 3843478.77 frames. ], batch size: 75, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:31:52,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1903990.0, ans=0.1 2024-08-13 00:31:55,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1904090.0, ans=0.125 2024-08-13 00:32:02,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1904090.0, ans=0.0 2024-08-13 00:32:08,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1904190.0, ans=0.1 2024-08-13 00:32:20,984 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 12 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 00:32:27,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1904290.0, ans=0.2 2024-08-13 00:32:30,078 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.391e+01 2.734e+01 3.144e+01 4.841e+01, threshold=5.468e+01, percent-clipped=0.0 2024-08-13 00:32:49,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1904390.0, ans=0.125 2024-08-13 00:32:52,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1904390.0, ans=0.0 2024-08-13 00:32:56,042 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2050, loss[loss=0.1241, beats_loss=0.01035, ecapa_loss=0.0001286, whisper_loss=0.1125, over 23429.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01082, ecapa_loss=0.0001652, whisper_loss=0.09137, over 3866544.45 frames. ], batch size: 86, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:33:05,443 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.01 vs. limit=22.5 2024-08-13 00:33:09,236 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 17 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 00:33:15,537 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 00:33:22,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1904590.0, ans=0.0 2024-08-13 00:33:30,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1904690.0, ans=0.0 2024-08-13 00:33:48,873 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 00:33:54,958 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 00:33:58,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1904890.0, ans=0.125 2024-08-13 00:33:58,367 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-13 00:34:12,446 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2100, loss[loss=0.07942, beats_loss=0.007563, ecapa_loss=0.0001792, whisper_loss=0.07007, over 14266.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01093, ecapa_loss=0.0001637, whisper_loss=0.08996, over 3843023.35 frames. ], batch size: 55, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:34:33,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1905090.0, ans=0.125 2024-08-13 00:34:39,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1905090.0, ans=0.07 2024-08-13 00:34:47,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1905190.0, ans=0.125 2024-08-13 00:35:03,408 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.317e+01 2.588e+01 2.864e+01 4.791e+01, threshold=5.176e+01, percent-clipped=0.0 2024-08-13 00:35:13,190 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-13 00:35:29,563 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2150, loss[loss=0.1024, beats_loss=0.01032, ecapa_loss=0.0001955, whisper_loss=0.09011, over 21760.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01098, ecapa_loss=0.0001643, whisper_loss=0.09022, over 3871369.33 frames. ], batch size: 89, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:35:38,766 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.65 vs. limit=10.0 2024-08-13 00:35:56,158 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-08-13 00:36:05,633 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-13 00:36:17,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1905790.0, ans=0.125 2024-08-13 00:36:18,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1905790.0, ans=0.125 2024-08-13 00:36:26,550 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 15 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-13 00:36:39,065 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.01 vs. limit=15.0 2024-08-13 00:36:51,241 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2200, loss[loss=0.1144, beats_loss=0.01038, ecapa_loss=0.0001743, whisper_loss=0.1023, over 21777.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01097, ecapa_loss=0.000164, whisper_loss=0.09038, over 3841045.84 frames. ], batch size: 88, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:37:26,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1906190.0, ans=0.125 2024-08-13 00:37:31,182 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 17 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-13 00:37:45,672 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.358e+01 2.742e+01 3.274e+01 9.057e+01, threshold=5.483e+01, percent-clipped=3.0 2024-08-13 00:37:46,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1906290.0, ans=0.125 2024-08-13 00:37:52,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1906290.0, ans=0.125 2024-08-13 00:37:55,182 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=12.0 2024-08-13 00:37:58,254 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.97 vs. limit=15.0 2024-08-13 00:38:02,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1906390.0, ans=0.125 2024-08-13 00:38:12,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1906490.0, ans=0.0 2024-08-13 00:38:13,406 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2250, loss[loss=0.1112, beats_loss=0.01175, ecapa_loss=0.0001586, whisper_loss=0.09789, over 22279.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01097, ecapa_loss=0.0001642, whisper_loss=0.09152, over 3848035.22 frames. ], batch size: 88, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:38:39,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1906590.0, ans=0.2 2024-08-13 00:38:47,218 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-13 00:38:52,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1906690.0, ans=0.125 2024-08-13 00:38:55,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1906690.0, ans=0.125 2024-08-13 00:39:04,530 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.99 vs. limit=15.0 2024-08-13 00:39:06,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2024-08-13 00:39:27,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1906890.0, ans=0.1 2024-08-13 00:39:37,964 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2300, loss[loss=0.0903, beats_loss=0.01135, ecapa_loss=0.0001369, whisper_loss=0.07758, over 21344.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01096, ecapa_loss=0.0001649, whisper_loss=0.09185, over 3868476.43 frames. ], batch size: 84, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:39:45,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1906990.0, ans=0.125 2024-08-13 00:39:48,355 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.24 vs. limit=15.0 2024-08-13 00:39:58,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1907090.0, ans=15.0 2024-08-13 00:40:02,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1907090.0, ans=0.125 2024-08-13 00:40:24,095 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-13 00:40:25,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1907190.0, ans=0.0 2024-08-13 00:40:30,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1907290.0, ans=0.0 2024-08-13 00:40:32,897 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.474e+01 2.795e+01 3.232e+01 6.818e+01, threshold=5.590e+01, percent-clipped=1.0 2024-08-13 00:40:47,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1907390.0, ans=0.0 2024-08-13 00:40:50,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1907390.0, ans=0.0 2024-08-13 00:41:00,116 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2350, loss[loss=0.08571, beats_loss=0.0116, ecapa_loss=0.0001669, whisper_loss=0.07245, over 19687.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01095, ecapa_loss=0.0001642, whisper_loss=0.09212, over 3872083.12 frames. ], batch size: 78, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:41:12,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1907490.0, ans=0.125 2024-08-13 00:41:25,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1907590.0, ans=0.125 2024-08-13 00:41:28,784 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2024-08-13 00:41:37,551 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 00:41:45,395 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 00:42:22,820 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2400, loss[loss=0.08515, beats_loss=0.01142, ecapa_loss=0.0001206, whisper_loss=0.07253, over 19935.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01092, ecapa_loss=0.000165, whisper_loss=0.0922, over 3905493.71 frames. ], batch size: 76, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:42:23,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1907990.0, ans=0.125 2024-08-13 00:42:32,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1907990.0, ans=0.2 2024-08-13 00:43:01,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1908190.0, ans=0.0 2024-08-13 00:43:01,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=1908190.0, ans=15.0 2024-08-13 00:43:13,862 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 20 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-13 00:43:16,948 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.472e+01 2.673e+01 3.015e+01 1.435e+02, threshold=5.346e+01, percent-clipped=1.0 2024-08-13 00:43:25,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1908290.0, ans=0.125 2024-08-13 00:43:27,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1908390.0, ans=0.0 2024-08-13 00:43:36,446 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.63 vs. limit=12.0 2024-08-13 00:43:45,247 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2450, loss[loss=0.09984, beats_loss=0.01089, ecapa_loss=0.0001556, whisper_loss=0.08739, over 19316.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01092, ecapa_loss=0.0001652, whisper_loss=0.09131, over 3882726.75 frames. ], batch size: 75, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:43:56,512 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.89 vs. limit=15.0 2024-08-13 00:44:08,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1908590.0, ans=0.125 2024-08-13 00:44:11,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1908590.0, ans=0.125 2024-08-13 00:44:23,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1908690.0, ans=0.0 2024-08-13 00:44:58,186 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 12 from Vox, 44 fro AS 2024-08-13 00:45:06,583 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2500, loss[loss=0.1054, beats_loss=0.01078, ecapa_loss=0.0001587, whisper_loss=0.09304, over 16336.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01094, ecapa_loss=0.000166, whisper_loss=0.09116, over 3874391.38 frames. ], batch size: 62, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:45:26,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1909090.0, ans=0.0 2024-08-13 00:45:31,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1909090.0, ans=0.2 2024-08-13 00:45:32,712 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 13 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 00:45:37,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1909090.0, ans=0.125 2024-08-13 00:45:54,122 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 00:46:01,282 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.126e+01 2.554e+01 2.851e+01 3.287e+01 4.773e+01, threshold=5.702e+01, percent-clipped=0.0 2024-08-13 00:46:29,602 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 00:46:31,416 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2550, loss[loss=0.1042, beats_loss=0.009535, ecapa_loss=0.0001378, whisper_loss=0.09324, over 18038.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0109, ecapa_loss=0.0001653, whisper_loss=0.09196, over 3894480.46 frames. ], batch size: 65, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:46:36,009 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 31 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 00:46:59,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1909590.0, ans=0.0 2024-08-13 00:47:13,907 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:47:13,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1909690.0, ans=0.125 2024-08-13 00:47:35,436 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.21 vs. limit=15.0 2024-08-13 00:47:53,758 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2600, loss[loss=0.1118, beats_loss=0.008377, ecapa_loss=0.0001604, whisper_loss=0.1018, over 18119.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01092, ecapa_loss=0.0001675, whisper_loss=0.09146, over 3897132.06 frames. ], batch size: 67, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:47:55,438 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-13 00:48:17,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1910090.0, ans=0.125 2024-08-13 00:48:24,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1910090.0, ans=0.1 2024-08-13 00:48:24,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1910090.0, ans=0.125 2024-08-13 00:48:52,318 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.514e+01 2.741e+01 3.048e+01 4.490e+01, threshold=5.482e+01, percent-clipped=0.0 2024-08-13 00:48:54,227 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-13 00:48:55,653 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 00:48:56,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1910290.0, ans=0.125 2024-08-13 00:49:21,800 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2650, loss[loss=0.1365, beats_loss=0.006819, ecapa_loss=0.0001559, whisper_loss=0.1282, over 17652.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01098, ecapa_loss=0.0001669, whisper_loss=0.0909, over 3906702.35 frames. ], batch size: 66, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:49:21,949 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-13 00:49:22,565 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-13 00:49:23,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1910490.0, ans=0.2 2024-08-13 00:49:25,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1910490.0, ans=0.0 2024-08-13 00:49:49,573 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 00:49:50,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1910590.0, ans=0.125 2024-08-13 00:49:57,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1910690.0, ans=0.05 2024-08-13 00:50:11,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1910790.0, ans=22.5 2024-08-13 00:50:18,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1910790.0, ans=0.1 2024-08-13 00:50:43,680 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2700, loss[loss=0.0971, beats_loss=0.01149, ecapa_loss=0.0001457, whisper_loss=0.08415, over 16303.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01092, ecapa_loss=0.0001679, whisper_loss=0.09165, over 3935131.36 frames. ], batch size: 64, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:51:17,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2024-08-13 00:51:23,095 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.71 vs. limit=12.0 2024-08-13 00:51:23,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1911190.0, ans=0.2 2024-08-13 00:51:25,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1911190.0, ans=0.125 2024-08-13 00:51:30,193 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 00:51:30,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1911190.0, ans=0.1 2024-08-13 00:51:35,297 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 00:51:38,034 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.492e+01 2.764e+01 3.227e+01 2.218e+02, threshold=5.527e+01, percent-clipped=1.0 2024-08-13 00:51:48,833 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 00:51:58,346 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 42 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 00:52:06,525 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2750, loss[loss=0.1444, beats_loss=0.007789, ecapa_loss=0.0001635, whisper_loss=0.135, over 18337.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01091, ecapa_loss=0.0001682, whisper_loss=0.09182, over 3931736.53 frames. ], batch size: 70, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:52:19,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1911490.0, ans=0.0 2024-08-13 00:52:36,167 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2024-08-13 00:52:39,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1911690.0, ans=0.125 2024-08-13 00:52:46,980 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 33 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 00:52:58,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1911790.0, ans=0.2 2024-08-13 00:53:24,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1911890.0, ans=0.1 2024-08-13 00:53:31,005 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2800, loss[loss=0.106, beats_loss=0.01054, ecapa_loss=0.0001803, whisper_loss=0.09368, over 17425.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01092, ecapa_loss=0.0001675, whisper_loss=0.09182, over 3916235.77 frames. ], batch size: 69, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:54:10,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1912190.0, ans=0.125 2024-08-13 00:54:19,675 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:54:28,627 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.473e+01 2.733e+01 3.017e+01 4.460e+01, threshold=5.467e+01, percent-clipped=0.0 2024-08-13 00:54:42,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1912390.0, ans=0.05 2024-08-13 00:54:43,910 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 38 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 00:54:48,748 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.38 vs. limit=10.0 2024-08-13 00:54:54,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1912390.0, ans=0.1 2024-08-13 00:54:57,324 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2850, loss[loss=0.1208, beats_loss=0.008446, ecapa_loss=0.0001642, whisper_loss=0.1107, over 21979.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0109, ecapa_loss=0.0001672, whisper_loss=0.0922, over 3911396.93 frames. ], batch size: 85, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:55:07,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1912490.0, ans=0.125 2024-08-13 00:55:15,437 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-13 00:55:15,991 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2024-08-13 00:55:23,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1912590.0, ans=0.125 2024-08-13 00:55:23,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1912590.0, ans=0.1 2024-08-13 00:55:50,931 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-13 00:56:00,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1912790.0, ans=0.2 2024-08-13 00:56:15,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1912890.0, ans=0.2 2024-08-13 00:56:20,466 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2900, loss[loss=0.1105, beats_loss=0.0105, ecapa_loss=0.0002015, whisper_loss=0.098, over 15556.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01092, ecapa_loss=0.0001694, whisper_loss=0.09196, over 3930347.89 frames. ], batch size: 62, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:56:24,569 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 00:56:31,486 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-13 00:56:32,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1912990.0, ans=0.2 2024-08-13 00:56:39,721 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-13 00:56:49,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1913090.0, ans=0.125 2024-08-13 00:56:53,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1913090.0, ans=0.1 2024-08-13 00:57:15,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1913290.0, ans=0.05 2024-08-13 00:57:18,751 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.481e+01 2.818e+01 3.186e+01 4.138e+01, threshold=5.637e+01, percent-clipped=0.0 2024-08-13 00:57:45,362 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 2950, loss[loss=0.09254, beats_loss=0.008967, ecapa_loss=0.0001658, whisper_loss=0.08192, over 15966.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01082, ecapa_loss=0.0001702, whisper_loss=0.09255, over 3933414.00 frames. ], batch size: 62, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:57:58,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1913490.0, ans=0.125 2024-08-13 00:58:02,780 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 00:58:09,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1913590.0, ans=0.125 2024-08-13 00:58:13,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1913590.0, ans=0.125 2024-08-13 00:58:28,649 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 00:58:44,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1913790.0, ans=0.125 2024-08-13 00:59:01,329 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 00:59:02,548 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3000, loss[loss=0.09249, beats_loss=0.0118, ecapa_loss=0.0001785, whisper_loss=0.0789, over 14173.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01074, ecapa_loss=0.0001695, whisper_loss=0.09265, over 3911142.98 frames. ], batch size: 57, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:59:02,548 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-13 00:59:43,434 INFO [train_multi_KD3.py:1149] (2/4) Epoch 14, validation on ASR_libri: loss=0.2544, beats_loss=0, ecapa_loss=0.0005759, whisper_loss=0.2486, over 922467.00 frames. 2024-08-13 01:00:02,163 INFO [train_multi_KD3.py:1149] (2/4) Epoch 14, validation on SV_voxceleb1: loss=0.004628, beats_loss=0, ecapa_loss=0.0004628, whisper_loss=0, over 939242.00 frames. 2024-08-13 01:01:59,794 INFO [train_multi_KD3.py:1149] (2/4) Epoch 14, validation on AT_audioset: loss=0.02407, beats_loss=0.02407, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 01:01:59,797 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-13 01:02:50,254 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.523e+01 2.729e+01 3.233e+01 5.051e+01, threshold=5.458e+01, percent-clipped=0.0 2024-08-13 01:03:09,626 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 01:03:16,482 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3050, loss[loss=0.1012, beats_loss=0.008289, ecapa_loss=0.0002048, whisper_loss=0.0909, over 21783.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01078, ecapa_loss=0.0001697, whisper_loss=0.09304, over 3936845.96 frames. ], batch size: 86, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:03:23,741 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 01:03:30,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1914590.0, ans=0.0 2024-08-13 01:03:31,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1914590.0, ans=0.0 2024-08-13 01:03:33,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1914590.0, ans=0.125 2024-08-13 01:03:52,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1914690.0, ans=0.0 2024-08-13 01:04:07,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=1914790.0, ans=0.02 2024-08-13 01:04:08,425 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 01:04:14,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1914890.0, ans=0.0 2024-08-13 01:04:19,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1914890.0, ans=0.1 2024-08-13 01:04:19,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1914890.0, ans=0.125 2024-08-13 01:04:30,253 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3100, loss[loss=0.1029, beats_loss=0.01019, ecapa_loss=0.000143, whisper_loss=0.0913, over 21144.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01083, ecapa_loss=0.0001704, whisper_loss=0.09301, over 3915895.27 frames. ], batch size: 79, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:04:36,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1914990.0, ans=0.5 2024-08-13 01:04:49,222 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.06 vs. limit=22.5 2024-08-13 01:04:51,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1915090.0, ans=0.5 2024-08-13 01:04:55,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1915090.0, ans=0.125 2024-08-13 01:04:56,111 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.50 vs. limit=15.0 2024-08-13 01:05:18,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1915290.0, ans=0.1 2024-08-13 01:05:18,705 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.430e+01 2.726e+01 3.080e+01 5.396e+01, threshold=5.451e+01, percent-clipped=0.0 2024-08-13 01:05:18,851 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 01:05:44,507 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3150, loss[loss=0.09301, beats_loss=0.0133, ecapa_loss=0.0001565, whisper_loss=0.07814, over 20926.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01081, ecapa_loss=0.0001698, whisper_loss=0.09307, over 3918784.40 frames. ], batch size: 83, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:05:49,249 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-13 01:06:22,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1915690.0, ans=0.1 2024-08-13 01:06:41,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1915790.0, ans=0.0 2024-08-13 01:06:45,166 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.10 vs. limit=22.5 2024-08-13 01:06:52,661 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 01:06:56,697 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 01:06:58,193 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3200, loss[loss=0.0946, beats_loss=0.009844, ecapa_loss=0.0001522, whisper_loss=0.08324, over 14044.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01081, ecapa_loss=0.0001705, whisper_loss=0.09325, over 3904902.49 frames. ], batch size: 54, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:07:01,232 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 01:07:22,902 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 01:07:28,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1916190.0, ans=0.125 2024-08-13 01:07:40,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1916290.0, ans=0.0 2024-08-13 01:07:45,881 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.363e+01 2.691e+01 2.946e+01 6.786e+01, threshold=5.382e+01, percent-clipped=1.0 2024-08-13 01:07:53,072 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.70 vs. limit=15.0 2024-08-13 01:08:03,043 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 01:08:10,857 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3250, loss[loss=0.1042, beats_loss=0.01042, ecapa_loss=0.0001824, whisper_loss=0.09199, over 23039.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01081, ecapa_loss=0.0001703, whisper_loss=0.09368, over 3900925.53 frames. ], batch size: 93, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:08:25,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1916590.0, ans=0.0 2024-08-13 01:08:38,797 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.26 vs. limit=15.0 2024-08-13 01:08:52,384 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.82 vs. limit=22.5 2024-08-13 01:08:53,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1916690.0, ans=0.125 2024-08-13 01:08:57,470 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-13 01:09:12,148 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 01:09:15,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1916890.0, ans=0.2 2024-08-13 01:09:21,973 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 16 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 01:09:23,435 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-08-13 01:09:25,371 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3300, loss[loss=0.1039, beats_loss=0.01137, ecapa_loss=0.0001662, whisper_loss=0.09085, over 17355.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01083, ecapa_loss=0.0001712, whisper_loss=0.09343, over 3913872.45 frames. ], batch size: 69, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:09:49,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1917090.0, ans=0.0 2024-08-13 01:09:59,141 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.74 vs. limit=6.0 2024-08-13 01:10:08,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1917290.0, ans=0.0 2024-08-13 01:10:09,050 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2024-08-13 01:10:13,642 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.401e+01 2.681e+01 3.036e+01 4.663e+01, threshold=5.362e+01, percent-clipped=0.0 2024-08-13 01:10:30,131 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 01:10:33,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1917390.0, ans=0.0 2024-08-13 01:10:38,815 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3350, loss[loss=0.1066, beats_loss=0.01045, ecapa_loss=0.0001269, whisper_loss=0.09489, over 22788.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01075, ecapa_loss=0.0001697, whisper_loss=0.09332, over 3913084.49 frames. ], batch size: 84, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:10:58,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1917590.0, ans=0.1 2024-08-13 01:11:03,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1917590.0, ans=0.07 2024-08-13 01:11:17,288 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-13 01:11:20,741 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 01:11:29,639 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 01:11:30,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1917790.0, ans=0.125 2024-08-13 01:11:34,036 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.88 vs. limit=22.5 2024-08-13 01:11:42,524 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 01:11:52,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1917890.0, ans=0.1 2024-08-13 01:11:53,788 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-13 01:11:55,780 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3400, loss[loss=0.1199, beats_loss=0.008858, ecapa_loss=0.0001807, whisper_loss=0.1092, over 22823.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01075, ecapa_loss=0.0001688, whisper_loss=0.09321, over 3945386.89 frames. ], batch size: 92, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:11:56,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1917990.0, ans=0.0 2024-08-13 01:11:57,713 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 01:12:07,609 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2024-08-13 01:12:11,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1918090.0, ans=0.2 2024-08-13 01:12:45,597 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.443e+01 2.703e+01 3.105e+01 5.409e+01, threshold=5.407e+01, percent-clipped=1.0 2024-08-13 01:12:49,959 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 01:12:59,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1918390.0, ans=0.125 2024-08-13 01:13:08,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1918390.0, ans=0.2 2024-08-13 01:13:10,379 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3450, loss[loss=0.1241, beats_loss=0.01017, ecapa_loss=0.0001531, whisper_loss=0.1124, over 16851.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01081, ecapa_loss=0.0001693, whisper_loss=0.09228, over 3920122.93 frames. ], batch size: 65, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:13:10,543 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-13 01:13:33,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1918590.0, ans=0.1 2024-08-13 01:13:38,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1918690.0, ans=0.125 2024-08-13 01:14:20,055 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3500, loss[loss=0.1079, beats_loss=0.01156, ecapa_loss=0.0001697, whisper_loss=0.09461, over 22145.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01077, ecapa_loss=0.0001702, whisper_loss=0.09189, over 3884036.22 frames. ], batch size: 88, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:14:27,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1918990.0, ans=0.0 2024-08-13 01:14:42,406 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 01:14:42,994 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-08-13 01:14:45,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1919090.0, ans=0.125 2024-08-13 01:15:05,869 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.466e+01 2.782e+01 3.112e+01 6.873e+01, threshold=5.565e+01, percent-clipped=2.0 2024-08-13 01:15:18,465 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 01:15:29,697 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3550, loss[loss=0.1317, beats_loss=0.01011, ecapa_loss=0.0001238, whisper_loss=0.1203, over 15439.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01077, ecapa_loss=0.0001704, whisper_loss=0.09197, over 3911176.48 frames. ], batch size: 55, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:15:30,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1919490.0, ans=0.2 2024-08-13 01:15:44,009 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 01:15:57,498 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2024-08-13 01:16:21,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1919790.0, ans=0.125 2024-08-13 01:16:27,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1919890.0, ans=0.0 2024-08-13 01:16:29,536 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-13 01:16:36,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1919890.0, ans=0.125 2024-08-13 01:16:40,621 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3600, loss[loss=0.09692, beats_loss=0.01089, ecapa_loss=0.0001638, whisper_loss=0.08439, over 17772.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01075, ecapa_loss=0.0001702, whisper_loss=0.09218, over 3890657.35 frames. ], batch size: 68, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:16:45,282 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 26 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 01:16:49,086 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.14 vs. limit=15.0 2024-08-13 01:16:51,778 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-13 01:16:55,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1919990.0, ans=0.125 2024-08-13 01:17:00,713 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 01:17:04,330 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2024-08-13 01:17:18,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1920190.0, ans=0.0 2024-08-13 01:17:30,089 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.424e+01 2.680e+01 3.106e+01 1.010e+02, threshold=5.360e+01, percent-clipped=5.0 2024-08-13 01:17:40,031 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 01:17:41,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1920390.0, ans=0.125 2024-08-13 01:17:53,652 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3650, loss[loss=0.1243, beats_loss=0.01092, ecapa_loss=0.0001409, whisper_loss=0.1119, over 24279.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01083, ecapa_loss=0.0001709, whisper_loss=0.09209, over 3895457.00 frames. ], batch size: 88, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:17:55,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1920490.0, ans=0.125 2024-08-13 01:18:01,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1920490.0, ans=0.0 2024-08-13 01:18:04,743 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 01:18:11,907 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 01:18:18,521 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.46 vs. limit=15.0 2024-08-13 01:18:20,444 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 01:18:22,012 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 01:18:39,976 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 01:18:50,599 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.07 vs. limit=6.0 2024-08-13 01:19:03,624 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3700, loss[loss=0.09546, beats_loss=0.01152, ecapa_loss=0.0001441, whisper_loss=0.08249, over 20243.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01083, ecapa_loss=0.0001702, whisper_loss=0.09225, over 3888766.27 frames. ], batch size: 79, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:19:05,948 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2024-08-13 01:19:06,644 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-13 01:19:40,954 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2024-08-13 01:19:41,663 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-13 01:19:42,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1921190.0, ans=0.125 2024-08-13 01:19:49,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.425e+01 2.811e+01 3.262e+01 7.758e+01, threshold=5.621e+01, percent-clipped=2.0 2024-08-13 01:19:51,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1921290.0, ans=0.2 2024-08-13 01:19:55,746 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-13 01:19:59,929 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 01:20:04,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1921390.0, ans=0.0 2024-08-13 01:20:08,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1921390.0, ans=0.125 2024-08-13 01:20:08,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1921390.0, ans=0.0 2024-08-13 01:20:13,855 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3750, loss[loss=0.1057, beats_loss=0.01201, ecapa_loss=0.0001492, whisper_loss=0.09221, over 21861.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.0001702, whisper_loss=0.09169, over 3859639.75 frames. ], batch size: 88, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:20:38,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1921590.0, ans=0.125 2024-08-13 01:20:44,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1921690.0, ans=0.125 2024-08-13 01:20:52,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1921690.0, ans=0.1 2024-08-13 01:20:53,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1921690.0, ans=0.2 2024-08-13 01:20:53,747 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2024-08-13 01:20:56,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1921790.0, ans=0.1 2024-08-13 01:21:10,023 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.90 vs. limit=22.5 2024-08-13 01:21:22,052 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 01:21:23,247 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3800, loss[loss=0.1015, beats_loss=0.0121, ecapa_loss=0.0001563, whisper_loss=0.08782, over 15493.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01099, ecapa_loss=0.0001696, whisper_loss=0.09055, over 3844474.93 frames. ], batch size: 61, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:21:26,036 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 01:21:27,394 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 01:21:29,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1921990.0, ans=0.0 2024-08-13 01:21:33,405 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=3.947e-02 2024-08-13 01:21:34,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1921990.0, ans=0.1 2024-08-13 01:21:38,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1922090.0, ans=0.125 2024-08-13 01:21:41,991 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 01:21:52,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1922190.0, ans=0.125 2024-08-13 01:22:01,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1922190.0, ans=0.125 2024-08-13 01:22:08,996 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.489e+01 2.785e+01 3.114e+01 6.895e+01, threshold=5.569e+01, percent-clipped=1.0 2024-08-13 01:22:18,373 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-13 01:22:20,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1922390.0, ans=15.0 2024-08-13 01:22:26,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1922390.0, ans=0.125 2024-08-13 01:22:32,672 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3850, loss[loss=0.1003, beats_loss=0.01195, ecapa_loss=0.0001708, whisper_loss=0.08664, over 14317.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01094, ecapa_loss=0.0001699, whisper_loss=0.09143, over 3845646.47 frames. ], batch size: 57, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:22:49,838 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 01:22:57,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1922590.0, ans=0.125 2024-08-13 01:23:21,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1922790.0, ans=0.2 2024-08-13 01:23:32,374 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.639e-02 2024-08-13 01:23:37,568 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 01:23:39,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1922890.0, ans=0.0 2024-08-13 01:23:42,768 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3900, loss[loss=0.1249, beats_loss=0.005907, ecapa_loss=0.0002211, whisper_loss=0.1167, over 17488.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01091, ecapa_loss=0.0001707, whisper_loss=0.09155, over 3853072.10 frames. ], batch size: 68, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:23:50,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1922990.0, ans=0.2 2024-08-13 01:24:02,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1923090.0, ans=0.125 2024-08-13 01:24:13,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1923190.0, ans=0.125 2024-08-13 01:24:16,499 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=12.0 2024-08-13 01:24:28,529 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.595e+01 2.867e+01 3.243e+01 6.009e+01, threshold=5.735e+01, percent-clipped=1.0 2024-08-13 01:24:32,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1923290.0, ans=0.125 2024-08-13 01:24:39,314 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2024-08-13 01:24:52,314 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 3950, loss[loss=0.1239, beats_loss=0.01065, ecapa_loss=0.0001547, whisper_loss=0.1117, over 23210.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01084, ecapa_loss=0.0001703, whisper_loss=0.09246, over 3895898.23 frames. ], batch size: 93, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:24:55,270 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 13 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 01:24:55,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1923490.0, ans=0.1 2024-08-13 01:25:25,799 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=12.0 2024-08-13 01:25:30,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1923690.0, ans=0.05 2024-08-13 01:25:34,372 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 01:25:47,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1923890.0, ans=0.125 2024-08-13 01:25:47,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1923890.0, ans=0.07 2024-08-13 01:25:54,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1923890.0, ans=0.125 2024-08-13 01:25:54,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1923890.0, ans=0.2 2024-08-13 01:26:02,099 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4000, loss[loss=0.114, beats_loss=0.01077, ecapa_loss=0.0001904, whisper_loss=0.1013, over 22694.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01081, ecapa_loss=0.0001715, whisper_loss=0.0921, over 3861022.26 frames. ], batch size: 91, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:26:18,018 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 01:26:23,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1924090.0, ans=0.125 2024-08-13 01:26:47,949 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.537e+01 2.883e+01 3.271e+01 5.034e+01, threshold=5.767e+01, percent-clipped=0.0 2024-08-13 01:26:52,286 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 01:27:02,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1924390.0, ans=0.125 2024-08-13 01:27:11,442 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-08-13 01:27:12,040 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4050, loss[loss=0.06814, beats_loss=0.01198, ecapa_loss=0.0001712, whisper_loss=0.05444, over 21348.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01085, ecapa_loss=0.0001706, whisper_loss=0.09134, over 3870967.66 frames. ], batch size: 90, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:27:15,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1924490.0, ans=0.025 2024-08-13 01:27:16,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1924490.0, ans=0.04949747468305833 2024-08-13 01:27:26,155 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 01:28:04,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1924790.0, ans=0.0 2024-08-13 01:28:09,214 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-13 01:28:19,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1924890.0, ans=0.1 2024-08-13 01:28:21,336 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4100, loss[loss=0.1018, beats_loss=0.007259, ecapa_loss=0.000229, whisper_loss=0.09224, over 13250.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01095, ecapa_loss=0.0001697, whisper_loss=0.09152, over 3880644.16 frames. ], batch size: 55, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:28:33,740 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.88 vs. limit=10.0 2024-08-13 01:28:33,973 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.92 vs. limit=15.0 2024-08-13 01:28:35,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1925090.0, ans=15.0 2024-08-13 01:28:37,326 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 30 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 01:28:52,699 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 01:29:01,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1925190.0, ans=0.125 2024-08-13 01:29:08,166 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.339e+01 2.647e+01 3.027e+01 3.702e+01, threshold=5.294e+01, percent-clipped=0.0 2024-08-13 01:29:10,759 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.48 vs. limit=15.0 2024-08-13 01:29:22,199 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2024-08-13 01:29:22,773 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-13 01:29:32,679 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4150, loss[loss=0.09693, beats_loss=0.01151, ecapa_loss=0.0001781, whisper_loss=0.08364, over 19957.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0109, ecapa_loss=0.0001708, whisper_loss=0.09145, over 3862684.44 frames. ], batch size: 84, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:29:37,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1925490.0, ans=0.0 2024-08-13 01:29:37,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1925490.0, ans=0.0 2024-08-13 01:29:38,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1925490.0, ans=0.0 2024-08-13 01:29:40,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1925490.0, ans=0.07 2024-08-13 01:29:43,393 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.37 vs. limit=15.0 2024-08-13 01:29:47,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1925590.0, ans=0.125 2024-08-13 01:29:51,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1925590.0, ans=0.2 2024-08-13 01:29:53,805 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 33 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 01:29:54,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1925590.0, ans=0.2 2024-08-13 01:30:00,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1925690.0, ans=0.0 2024-08-13 01:30:04,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1925690.0, ans=0.1 2024-08-13 01:30:24,657 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-13 01:30:28,591 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 01:30:29,795 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.72 vs. limit=15.0 2024-08-13 01:30:43,187 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4200, loss[loss=0.1061, beats_loss=0.01017, ecapa_loss=0.0001754, whisper_loss=0.09414, over 19309.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.0001697, whisper_loss=0.09161, over 3873233.06 frames. ], batch size: 73, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:30:46,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1925990.0, ans=0.0 2024-08-13 01:30:50,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1925990.0, ans=0.05 2024-08-13 01:30:54,382 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 01:30:58,417 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-13 01:31:14,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1926190.0, ans=0.0 2024-08-13 01:31:14,509 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=15.0 2024-08-13 01:31:28,519 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.387e+01 2.732e+01 2.995e+01 7.981e+01, threshold=5.463e+01, percent-clipped=1.0 2024-08-13 01:31:34,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=22.5 2024-08-13 01:31:40,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1926390.0, ans=0.025 2024-08-13 01:31:52,409 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4250, loss[loss=0.08732, beats_loss=0.0131, ecapa_loss=0.0001346, whisper_loss=0.07288, over 22341.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01096, ecapa_loss=0.0001671, whisper_loss=0.09077, over 3889223.66 frames. ], batch size: 90, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:31:52,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1926490.0, ans=0.0 2024-08-13 01:31:56,802 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 01:32:02,535 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-13 01:32:09,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1926590.0, ans=0.0 2024-08-13 01:32:29,686 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.564e+01 2024-08-13 01:32:36,016 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 01:32:42,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1926790.0, ans=0.1 2024-08-13 01:33:02,219 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4300, loss[loss=0.1214, beats_loss=0.007959, ecapa_loss=0.0001456, whisper_loss=0.112, over 15803.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01101, ecapa_loss=0.0001672, whisper_loss=0.09008, over 3890630.78 frames. ], batch size: 58, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:33:10,523 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 01:33:15,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1927090.0, ans=0.1 2024-08-13 01:33:23,033 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-13 01:33:25,302 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.52 vs. limit=22.5 2024-08-13 01:33:30,064 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-13 01:33:40,039 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 01:33:47,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1927290.0, ans=0.125 2024-08-13 01:33:48,269 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.403e+01 2.611e+01 3.081e+01 4.718e+01, threshold=5.222e+01, percent-clipped=0.0 2024-08-13 01:33:52,317 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 01:33:57,295 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.07 vs. limit=15.0 2024-08-13 01:34:11,437 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4350, loss[loss=0.1169, beats_loss=0.01113, ecapa_loss=0.0001581, whisper_loss=0.1041, over 22893.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01096, ecapa_loss=0.000167, whisper_loss=0.09063, over 3892271.92 frames. ], batch size: 91, lr: 4.52e-03, grad_scale: 1.152921504606847e+18 2024-08-13 01:34:16,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1927490.0, ans=0.1 2024-08-13 01:34:25,260 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2024-08-13 01:34:44,843 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-13 01:34:45,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1927690.0, ans=10.0 2024-08-13 01:34:49,383 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 01:34:49,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1927690.0, ans=0.0 2024-08-13 01:35:00,202 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-13 01:35:10,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1927890.0, ans=0.125 2024-08-13 01:35:21,037 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4400, loss[loss=0.1316, beats_loss=0.01019, ecapa_loss=0.0001392, whisper_loss=0.12, over 21690.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0109, ecapa_loss=0.000167, whisper_loss=0.0916, over 3888825.86 frames. ], batch size: 81, lr: 4.52e-03, grad_scale: 1.152921504606847e+18 2024-08-13 01:35:25,901 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2024-08-13 01:35:31,597 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-08-13 01:35:32,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1927990.0, ans=0.1 2024-08-13 01:36:06,775 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.420e+01 2.637e+01 3.058e+01 4.603e+01, threshold=5.274e+01, percent-clipped=0.0 2024-08-13 01:36:11,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1928290.0, ans=0.125 2024-08-13 01:36:20,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1928390.0, ans=0.1 2024-08-13 01:36:24,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1928390.0, ans=0.125 2024-08-13 01:36:29,466 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 19 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 01:36:30,547 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4450, loss[loss=0.08114, beats_loss=0.01188, ecapa_loss=0.0001651, whisper_loss=0.06761, over 19116.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01092, ecapa_loss=0.0001671, whisper_loss=0.0913, over 3917029.38 frames. ], batch size: 76, lr: 4.52e-03, grad_scale: 1.152921504606847e+18 2024-08-13 01:36:46,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1928590.0, ans=15.0 2024-08-13 01:36:55,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1928590.0, ans=0.125 2024-08-13 01:36:58,566 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 20 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-13 01:37:05,024 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.22 vs. limit=15.0 2024-08-13 01:37:17,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1928790.0, ans=0.125 2024-08-13 01:37:22,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1928790.0, ans=0.125 2024-08-13 01:37:27,706 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 01:37:39,854 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4500, loss[loss=0.1102, beats_loss=0.009426, ecapa_loss=0.0001443, whisper_loss=0.0993, over 16504.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01094, ecapa_loss=0.0001656, whisper_loss=0.09103, over 3920503.05 frames. ], batch size: 61, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:37:40,033 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 01:37:45,743 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-13 01:37:58,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1929090.0, ans=0.0 2024-08-13 01:38:04,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1929090.0, ans=0.125 2024-08-13 01:38:04,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1929090.0, ans=0.125 2024-08-13 01:38:12,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1929190.0, ans=0.125 2024-08-13 01:38:15,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1929190.0, ans=0.125 2024-08-13 01:38:20,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1929290.0, ans=0.125 2024-08-13 01:38:27,274 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.366e+01 2.717e+01 3.132e+01 4.916e+01, threshold=5.434e+01, percent-clipped=0.0 2024-08-13 01:38:45,556 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 01:38:49,592 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4550, loss[loss=0.0926, beats_loss=0.01247, ecapa_loss=0.0001867, whisper_loss=0.07826, over 18823.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01093, ecapa_loss=0.000167, whisper_loss=0.09113, over 3941862.42 frames. ], batch size: 79, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:38:49,758 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 01:38:51,177 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 01:38:59,860 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 01:39:09,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1929590.0, ans=0.125 2024-08-13 01:39:12,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1929590.0, ans=0.2 2024-08-13 01:39:21,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1929690.0, ans=0.2 2024-08-13 01:39:24,580 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-13 01:39:59,451 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4600, loss[loss=0.08187, beats_loss=0.01079, ecapa_loss=0.0001351, whisper_loss=0.06973, over 16849.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01097, ecapa_loss=0.0001662, whisper_loss=0.09119, over 3930505.18 frames. ], batch size: 65, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:40:08,253 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 21 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 01:40:18,659 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=12.0 2024-08-13 01:40:19,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1930090.0, ans=0.125 2024-08-13 01:40:27,723 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 01:40:36,772 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 01:40:46,084 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.515e+01 2.755e+01 3.045e+01 4.770e+01, threshold=5.510e+01, percent-clipped=0.0 2024-08-13 01:40:47,856 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-13 01:40:52,377 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.51 vs. limit=22.5 2024-08-13 01:41:01,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1930390.0, ans=0.2 2024-08-13 01:41:04,299 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 01:41:04,595 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.53 vs. limit=6.0 2024-08-13 01:41:05,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1930390.0, ans=0.1 2024-08-13 01:41:07,983 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4650, loss[loss=0.1001, beats_loss=0.01176, ecapa_loss=0.0001915, whisper_loss=0.08639, over 19188.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01097, ecapa_loss=0.0001671, whisper_loss=0.09132, over 3948854.25 frames. ], batch size: 76, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:41:09,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1930490.0, ans=0.1 2024-08-13 01:41:15,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1930490.0, ans=0.125 2024-08-13 01:41:22,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1930590.0, ans=0.1 2024-08-13 01:41:23,490 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 01:41:33,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1930590.0, ans=0.125 2024-08-13 01:41:44,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1930690.0, ans=0.2 2024-08-13 01:41:46,454 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=15.0 2024-08-13 01:41:47,560 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2024-08-13 01:42:04,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1930890.0, ans=0.125 2024-08-13 01:42:09,323 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.63 vs. limit=10.0 2024-08-13 01:42:10,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1930890.0, ans=0.125 2024-08-13 01:42:16,705 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4700, loss[loss=0.121, beats_loss=0.008204, ecapa_loss=0.0001923, whisper_loss=0.1109, over 22454.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01094, ecapa_loss=0.0001667, whisper_loss=0.09219, over 3954532.41 frames. ], batch size: 88, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:42:17,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1930990.0, ans=0.125 2024-08-13 01:42:30,984 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 01:43:03,144 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.542e+01 2.823e+01 3.098e+01 3.628e+02, threshold=5.646e+01, percent-clipped=2.0 2024-08-13 01:43:05,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1931290.0, ans=0.1 2024-08-13 01:43:07,439 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-13 01:43:12,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1931390.0, ans=0.125 2024-08-13 01:43:13,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1931390.0, ans=0.1 2024-08-13 01:43:23,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1931390.0, ans=0.0 2024-08-13 01:43:26,265 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4750, loss[loss=0.08503, beats_loss=0.008838, ecapa_loss=0.0002029, whisper_loss=0.07416, over 14765.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01095, ecapa_loss=0.0001665, whisper_loss=0.09191, over 3964298.53 frames. ], batch size: 59, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:43:36,671 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=22.5 2024-08-13 01:43:45,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1931590.0, ans=0.2 2024-08-13 01:43:52,201 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 01:44:07,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1931690.0, ans=0.2 2024-08-13 01:44:07,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1931690.0, ans=0.125 2024-08-13 01:44:09,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1931790.0, ans=0.2 2024-08-13 01:44:09,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1931790.0, ans=0.125 2024-08-13 01:44:14,980 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 01:44:34,586 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 01:44:41,183 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 01:44:42,351 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4800, loss[loss=0.1171, beats_loss=0.01107, ecapa_loss=0.0001614, whisper_loss=0.1044, over 18268.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01103, ecapa_loss=0.000167, whisper_loss=0.09122, over 3945808.86 frames. ], batch size: 73, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:45:41,022 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2024-08-13 01:45:49,577 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.506e+01 2.786e+01 3.078e+01 4.876e+01, threshold=5.572e+01, percent-clipped=0.0 2024-08-13 01:46:03,415 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 01:46:11,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1932390.0, ans=0.1 2024-08-13 01:46:11,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1932390.0, ans=0.2 2024-08-13 01:46:14,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1932390.0, ans=0.2 2024-08-13 01:46:21,803 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4850, loss[loss=0.08828, beats_loss=0.01177, ecapa_loss=0.0001811, whisper_loss=0.0747, over 21931.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01113, ecapa_loss=0.0001663, whisper_loss=0.09108, over 3954347.89 frames. ], batch size: 93, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:46:27,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1932490.0, ans=0.05 2024-08-13 01:46:38,038 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 01:46:46,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1932590.0, ans=0.125 2024-08-13 01:46:50,058 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-13 01:47:35,784 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=12.0 2024-08-13 01:47:39,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=1932790.0, ans=0.02 2024-08-13 01:47:51,629 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 01:47:54,158 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 01:48:11,569 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4900, loss[loss=0.1084, beats_loss=0.009042, ecapa_loss=0.0001796, whisper_loss=0.09756, over 18087.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01105, ecapa_loss=0.0001667, whisper_loss=0.09211, over 3929246.09 frames. ], batch size: 70, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:48:21,023 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 01:48:21,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1932990.0, ans=0.125 2024-08-13 01:49:32,266 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.462e+01 2.765e+01 3.056e+01 4.985e+01, threshold=5.531e+01, percent-clipped=0.0 2024-08-13 01:49:50,192 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 01:49:55,634 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=12.0 2024-08-13 01:49:57,979 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 01:50:03,707 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 4950, loss[loss=0.1089, beats_loss=0.01244, ecapa_loss=0.0001656, whisper_loss=0.09482, over 19726.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.011, ecapa_loss=0.0001672, whisper_loss=0.0919, over 3904263.13 frames. ], batch size: 81, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:50:29,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1933590.0, ans=0.0 2024-08-13 01:50:33,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1933690.0, ans=0.1 2024-08-13 01:50:35,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1933690.0, ans=0.2 2024-08-13 01:50:36,462 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-13 01:50:41,252 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 01:50:49,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1933790.0, ans=0.125 2024-08-13 01:50:57,619 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 38 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-13 01:50:58,830 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-13 01:50:59,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1933790.0, ans=10.0 2024-08-13 01:51:00,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1933790.0, ans=15.0 2024-08-13 01:51:10,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1933890.0, ans=0.1 2024-08-13 01:51:14,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1933890.0, ans=0.0 2024-08-13 01:51:16,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1933890.0, ans=0.125 2024-08-13 01:51:19,448 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 01:51:20,979 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5000, loss[loss=0.1041, beats_loss=0.01016, ecapa_loss=0.0001759, whisper_loss=0.09221, over 21257.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01106, ecapa_loss=0.0001673, whisper_loss=0.09187, over 3888154.57 frames. ], batch size: 86, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:51:25,474 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 01:51:29,177 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 01:51:38,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1934090.0, ans=0.125 2024-08-13 01:51:43,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=1934090.0, ans=22.5 2024-08-13 01:51:45,376 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 01:51:46,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=1934090.0, ans=15.0 2024-08-13 01:51:50,602 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 01:51:52,036 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-13 01:51:52,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1934190.0, ans=0.125 2024-08-13 01:51:58,438 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-13 01:52:13,261 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.373e+01 2.737e+01 3.184e+01 6.268e+01, threshold=5.474e+01, percent-clipped=1.0 2024-08-13 01:52:22,361 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 01:52:26,123 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2024-08-13 01:52:30,346 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.22 vs. limit=12.0 2024-08-13 01:52:39,236 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5050, loss[loss=0.09483, beats_loss=0.01074, ecapa_loss=0.0001955, whisper_loss=0.08213, over 15231.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01117, ecapa_loss=0.0001666, whisper_loss=0.09079, over 3892382.72 frames. ], batch size: 64, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:52:43,170 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-13 01:52:44,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1934490.0, ans=0.125 2024-08-13 01:53:09,053 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 01:53:10,983 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 01:53:20,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1934690.0, ans=0.125 2024-08-13 01:53:30,430 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 01:53:31,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1934790.0, ans=0.125 2024-08-13 01:53:31,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.80 vs. limit=22.5 2024-08-13 01:53:44,431 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-13 01:53:46,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1934890.0, ans=0.125 2024-08-13 01:53:50,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1934890.0, ans=0.2 2024-08-13 01:54:00,083 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5100, loss[loss=0.1019, beats_loss=0.01257, ecapa_loss=0.0001553, whisper_loss=0.08781, over 21168.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01112, ecapa_loss=0.0001665, whisper_loss=0.09114, over 3927322.94 frames. ], batch size: 85, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:54:09,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1934990.0, ans=0.1 2024-08-13 01:54:13,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1934990.0, ans=0.125 2024-08-13 01:54:16,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1935090.0, ans=0.0 2024-08-13 01:54:32,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1935190.0, ans=0.125 2024-08-13 01:54:32,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1935190.0, ans=0.1 2024-08-13 01:54:32,191 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.70 vs. limit=10.0 2024-08-13 01:54:34,930 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 01:54:56,869 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.475e+01 2.679e+01 3.018e+01 4.914e+01, threshold=5.357e+01, percent-clipped=0.0 2024-08-13 01:55:06,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1935390.0, ans=0.0 2024-08-13 01:55:06,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1935390.0, ans=0.125 2024-08-13 01:55:22,003 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5150, loss[loss=0.1025, beats_loss=0.01307, ecapa_loss=0.0001507, whisper_loss=0.0879, over 22412.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01109, ecapa_loss=0.0001655, whisper_loss=0.09132, over 3912131.40 frames. ], batch size: 88, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:55:28,467 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 01:55:44,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1935590.0, ans=0.09899494936611666 2024-08-13 01:55:50,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1935590.0, ans=0.125 2024-08-13 01:55:52,087 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 01:55:53,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1935590.0, ans=0.0 2024-08-13 01:55:54,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1935690.0, ans=0.125 2024-08-13 01:56:05,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1935690.0, ans=0.1 2024-08-13 01:56:38,354 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 01:56:47,541 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5200, loss[loss=0.09655, beats_loss=0.01105, ecapa_loss=0.0001541, whisper_loss=0.08396, over 18699.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01106, ecapa_loss=0.0001661, whisper_loss=0.09119, over 3882843.60 frames. ], batch size: 71, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:57:10,584 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 01:57:11,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1936090.0, ans=0.2 2024-08-13 01:57:18,451 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-13 01:57:42,205 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.433e+01 2.676e+01 3.023e+01 1.012e+02, threshold=5.352e+01, percent-clipped=2.0 2024-08-13 01:57:43,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1936290.0, ans=0.0 2024-08-13 01:57:44,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1936290.0, ans=0.0 2024-08-13 01:58:07,321 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 01:58:08,513 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5250, loss[loss=0.1148, beats_loss=0.009663, ecapa_loss=0.0001669, whisper_loss=0.1035, over 17248.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.011, ecapa_loss=0.0001656, whisper_loss=0.09178, over 3871763.24 frames. ], batch size: 68, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:58:20,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1936490.0, ans=0.125 2024-08-13 01:58:22,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1936490.0, ans=0.125 2024-08-13 01:58:25,803 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.89 vs. limit=15.0 2024-08-13 01:58:40,692 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-08-13 01:58:43,342 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=15.0 2024-08-13 01:59:14,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1936890.0, ans=0.125 2024-08-13 01:59:15,879 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-13 01:59:30,441 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5300, loss[loss=0.09339, beats_loss=0.01156, ecapa_loss=0.0001642, whisper_loss=0.08019, over 21497.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01101, ecapa_loss=0.0001662, whisper_loss=0.09122, over 3861637.09 frames. ], batch size: 89, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:59:32,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1936990.0, ans=0.125 2024-08-13 01:59:35,547 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-13 01:59:41,865 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 01:59:48,067 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-13 01:59:52,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1937090.0, ans=0.125 2024-08-13 01:59:58,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1937090.0, ans=0.0 2024-08-13 02:00:07,920 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 12 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 02:00:08,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1937190.0, ans=0.0 2024-08-13 02:00:16,203 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-13 02:00:25,685 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.483e+01 2.816e+01 3.213e+01 1.142e+02, threshold=5.632e+01, percent-clipped=3.0 2024-08-13 02:00:28,855 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 02:00:30,194 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 02:00:33,852 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 02:00:43,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1937390.0, ans=0.0 2024-08-13 02:00:51,030 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5350, loss[loss=0.09625, beats_loss=0.0103, ecapa_loss=0.0002122, whisper_loss=0.08382, over 20775.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01094, ecapa_loss=0.0001669, whisper_loss=0.09128, over 3836572.57 frames. ], batch size: 88, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:00:53,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1937490.0, ans=0.1 2024-08-13 02:01:00,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1937490.0, ans=0.125 2024-08-13 02:01:01,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=1937490.0, ans=0.2 2024-08-13 02:01:04,138 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 02:01:05,929 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-13 02:01:09,961 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-13 02:01:15,212 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 30 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 02:01:15,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1937590.0, ans=0.125 2024-08-13 02:01:23,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1937690.0, ans=0.125 2024-08-13 02:01:29,031 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=11.10 vs. limit=12.0 2024-08-13 02:01:45,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1937790.0, ans=0.1 2024-08-13 02:01:46,322 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-13 02:02:01,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1937890.0, ans=0.1 2024-08-13 02:02:13,434 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5400, loss[loss=0.1274, beats_loss=0.009921, ecapa_loss=0.0001828, whisper_loss=0.1156, over 19191.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01087, ecapa_loss=0.0001678, whisper_loss=0.09193, over 3845618.87 frames. ], batch size: 73, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:02:13,526 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 02:02:17,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1937990.0, ans=0.0 2024-08-13 02:02:24,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1937990.0, ans=0.04949747468305833 2024-08-13 02:02:28,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1937990.0, ans=0.1 2024-08-13 02:02:30,957 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 02:03:08,324 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 02:03:09,722 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.492e+01 2.751e+01 3.252e+01 5.304e+01, threshold=5.502e+01, percent-clipped=0.0 2024-08-13 02:03:37,125 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5450, loss[loss=0.08718, beats_loss=0.009378, ecapa_loss=0.0001741, whisper_loss=0.07606, over 14028.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01089, ecapa_loss=0.0001678, whisper_loss=0.09185, over 3838615.35 frames. ], batch size: 55, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:03:37,313 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 29 from Vox, 18 fro AS 2024-08-13 02:03:39,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1938490.0, ans=0.95 2024-08-13 02:03:47,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1938490.0, ans=0.5 2024-08-13 02:03:55,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1938590.0, ans=0.035 2024-08-13 02:03:57,424 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 02:03:57,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1938590.0, ans=0.025 2024-08-13 02:04:06,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1938590.0, ans=0.0 2024-08-13 02:04:08,785 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 02:04:20,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1938690.0, ans=0.2 2024-08-13 02:04:59,450 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5500, loss[loss=0.1031, beats_loss=0.009412, ecapa_loss=0.0002068, whisper_loss=0.09164, over 19691.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01095, ecapa_loss=0.0001676, whisper_loss=0.09182, over 3846200.60 frames. ], batch size: 80, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:05:16,469 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 02:05:22,405 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.84 vs. limit=15.0 2024-08-13 02:05:31,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1939190.0, ans=0.09899494936611666 2024-08-13 02:05:35,638 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 02:05:46,273 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 02:05:49,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1939290.0, ans=0.1 2024-08-13 02:05:52,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1939290.0, ans=0.0 2024-08-13 02:05:52,723 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.471e+01 2.738e+01 3.080e+01 7.605e+01, threshold=5.476e+01, percent-clipped=2.0 2024-08-13 02:05:57,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1939290.0, ans=0.125 2024-08-13 02:06:18,510 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5550, loss[loss=0.1015, beats_loss=0.01086, ecapa_loss=0.0001467, whisper_loss=0.08916, over 20408.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01094, ecapa_loss=0.0001686, whisper_loss=0.09238, over 3865712.56 frames. ], batch size: 77, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:06:19,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1939490.0, ans=0.125 2024-08-13 02:06:26,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1939490.0, ans=0.0 2024-08-13 02:06:31,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1939490.0, ans=0.1 2024-08-13 02:06:38,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=15.0 2024-08-13 02:06:47,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1939590.0, ans=0.2 2024-08-13 02:06:51,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1939690.0, ans=0.5 2024-08-13 02:06:52,425 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-13 02:07:09,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1939790.0, ans=0.0 2024-08-13 02:07:18,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1939790.0, ans=0.125 2024-08-13 02:07:26,579 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-13 02:07:29,392 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-13 02:07:32,647 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 32 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-13 02:07:37,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1939990.0, ans=0.125 2024-08-13 02:07:38,636 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5600, loss[loss=0.1031, beats_loss=0.01136, ecapa_loss=0.0001854, whisper_loss=0.08994, over 17583.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0109, ecapa_loss=0.0001684, whisper_loss=0.09266, over 3874101.74 frames. ], batch size: 69, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:07:41,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1939990.0, ans=0.2 2024-08-13 02:07:56,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1940090.0, ans=0.0 2024-08-13 02:08:05,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1940090.0, ans=0.125 2024-08-13 02:08:08,246 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=12.0 2024-08-13 02:08:11,268 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 02:08:33,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1940290.0, ans=0.1 2024-08-13 02:08:35,847 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.482e+01 2.705e+01 3.003e+01 6.205e+01, threshold=5.410e+01, percent-clipped=1.0 2024-08-13 02:08:46,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1940390.0, ans=0.035 2024-08-13 02:08:53,692 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 38 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 02:09:01,716 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5650, loss[loss=0.1053, beats_loss=0.009999, ecapa_loss=0.0001551, whisper_loss=0.09379, over 15843.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01091, ecapa_loss=0.0001676, whisper_loss=0.09161, over 3883895.28 frames. ], batch size: 62, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:09:23,894 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 02:09:29,912 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 02:09:34,670 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 02:09:38,813 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 02:09:52,740 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-13 02:10:03,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1940790.0, ans=0.125 2024-08-13 02:10:22,404 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5700, loss[loss=0.1222, beats_loss=0.008714, ecapa_loss=0.000241, whisper_loss=0.1111, over 20350.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01087, ecapa_loss=0.0001687, whisper_loss=0.09202, over 3901831.64 frames. ], batch size: 85, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:10:28,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1940990.0, ans=0.1 2024-08-13 02:10:37,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1941090.0, ans=0.0 2024-08-13 02:10:42,279 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 27 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-13 02:10:52,200 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-13 02:11:01,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1941190.0, ans=0.125 2024-08-13 02:11:03,827 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 02:11:05,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1941190.0, ans=0.0 2024-08-13 02:11:08,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1941290.0, ans=0.0 2024-08-13 02:11:10,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1941290.0, ans=0.2 2024-08-13 02:11:16,720 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.518e+01 2.759e+01 3.173e+01 1.965e+02, threshold=5.519e+01, percent-clipped=1.0 2024-08-13 02:11:17,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1941290.0, ans=0.5 2024-08-13 02:11:41,417 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5750, loss[loss=0.1051, beats_loss=0.01096, ecapa_loss=0.0001926, whisper_loss=0.09221, over 21852.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01089, ecapa_loss=0.0001697, whisper_loss=0.09226, over 3912780.91 frames. ], batch size: 89, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:12:09,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1941590.0, ans=0.2 2024-08-13 02:12:15,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1941690.0, ans=0.125 2024-08-13 02:12:19,860 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 02:12:36,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1941790.0, ans=0.125 2024-08-13 02:12:48,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1941890.0, ans=0.0 2024-08-13 02:12:49,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1941890.0, ans=0.2 2024-08-13 02:12:53,566 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.45 vs. limit=15.0 2024-08-13 02:12:54,072 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-13 02:13:02,518 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5800, loss[loss=0.09989, beats_loss=0.009923, ecapa_loss=0.0001886, whisper_loss=0.08808, over 18500.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01089, ecapa_loss=0.00017, whisper_loss=0.09213, over 3937088.85 frames. ], batch size: 77, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:13:02,677 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-13 02:13:09,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=1941990.0, ans=0.1 2024-08-13 02:13:25,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1942090.0, ans=0.1 2024-08-13 02:13:50,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1942290.0, ans=0.125 2024-08-13 02:13:55,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1942290.0, ans=0.0 2024-08-13 02:13:57,663 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.443e+01 2.748e+01 3.161e+01 4.611e+01, threshold=5.495e+01, percent-clipped=0.0 2024-08-13 02:14:17,176 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 17 from LS+wenet, 36 from Vox, 37 fro AS 2024-08-13 02:14:17,939 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.75 vs. limit=15.0 2024-08-13 02:14:24,573 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5850, loss[loss=0.1097, beats_loss=0.009525, ecapa_loss=0.0001981, whisper_loss=0.09817, over 15486.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01096, ecapa_loss=0.0001696, whisper_loss=0.09162, over 3916088.27 frames. ], batch size: 62, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:14:45,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1942590.0, ans=0.1 2024-08-13 02:14:54,289 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.919e-03 2024-08-13 02:15:08,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1942690.0, ans=0.125 2024-08-13 02:15:21,522 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.23 vs. limit=15.0 2024-08-13 02:15:31,449 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 02:15:34,901 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 02:15:44,271 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=12.0 2024-08-13 02:15:47,592 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5900, loss[loss=0.1387, beats_loss=0.008235, ecapa_loss=0.0001838, whisper_loss=0.1287, over 23595.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01099, ecapa_loss=0.0001689, whisper_loss=0.09137, over 3893985.33 frames. ], batch size: 90, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:15:55,979 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 02:16:02,073 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 31 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 02:16:07,702 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 02:16:09,195 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 02:16:20,747 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 02:16:22,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1943190.0, ans=0.07 2024-08-13 02:16:25,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1943190.0, ans=0.2 2024-08-13 02:16:36,269 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 02:16:40,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1943290.0, ans=0.04949747468305833 2024-08-13 02:16:40,878 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.528e+01 2.790e+01 3.084e+01 1.766e+02, threshold=5.581e+01, percent-clipped=1.0 2024-08-13 02:16:42,782 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 02:17:07,177 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 5950, loss[loss=0.1088, beats_loss=0.01002, ecapa_loss=0.0001765, whisper_loss=0.09704, over 22933.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01099, ecapa_loss=0.0001685, whisper_loss=0.09154, over 3887720.19 frames. ], batch size: 92, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:17:10,274 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2024-08-13 02:17:10,878 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 02:18:02,478 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-13 02:18:05,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1943790.0, ans=0.125 2024-08-13 02:18:05,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1943790.0, ans=0.1 2024-08-13 02:18:13,701 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2024-08-13 02:18:23,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1943890.0, ans=0.0 2024-08-13 02:18:28,326 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6000, loss[loss=0.1053, beats_loss=0.01101, ecapa_loss=0.000159, whisper_loss=0.09269, over 18974.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01103, ecapa_loss=0.0001674, whisper_loss=0.09172, over 3917812.46 frames. ], batch size: 77, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:18:28,326 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-13 02:19:07,020 INFO [train_multi_KD3.py:1149] (2/4) Epoch 14, validation on ASR_libri: loss=0.2552, beats_loss=0, ecapa_loss=0.0005835, whisper_loss=0.2494, over 922467.00 frames. 2024-08-13 02:19:25,644 INFO [train_multi_KD3.py:1149] (2/4) Epoch 14, validation on SV_voxceleb1: loss=0.004586, beats_loss=0, ecapa_loss=0.0004586, whisper_loss=0, over 939242.00 frames. 2024-08-13 02:20:53,209 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1366, 1.7948, 1.9018, 1.7537], device='cuda:2') 2024-08-13 02:21:14,572 INFO [train_multi_KD3.py:1149] (2/4) Epoch 14, validation on AT_audioset: loss=0.02397, beats_loss=0.02397, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 02:21:14,575 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-13 02:21:30,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1944090.0, ans=0.04949747468305833 2024-08-13 02:21:41,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1944090.0, ans=0.1 2024-08-13 02:21:42,940 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 02:21:57,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1944190.0, ans=0.2 2024-08-13 02:22:08,098 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=12.0 2024-08-13 02:22:08,267 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.88 vs. limit=15.0 2024-08-13 02:22:10,216 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.472e+01 2.800e+01 3.130e+01 4.518e+01, threshold=5.599e+01, percent-clipped=0.0 2024-08-13 02:22:34,246 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.39 vs. limit=15.0 2024-08-13 02:22:35,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6050, loss[loss=0.09614, beats_loss=0.01112, ecapa_loss=0.0001654, whisper_loss=0.08336, over 15864.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0111, ecapa_loss=0.0001665, whisper_loss=0.09089, over 3892534.49 frames. ], batch size: 62, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:22:41,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1944490.0, ans=0.125 2024-08-13 02:22:51,080 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-13 02:23:06,622 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-13 02:23:16,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1944690.0, ans=0.1 2024-08-13 02:23:30,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1944790.0, ans=0.0 2024-08-13 02:23:57,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1944990.0, ans=0.125 2024-08-13 02:23:57,534 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 02:23:58,260 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6100, loss[loss=0.08739, beats_loss=0.01232, ecapa_loss=0.0001614, whisper_loss=0.07346, over 23039.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01106, ecapa_loss=0.0001673, whisper_loss=0.09137, over 3937925.08 frames. ], batch size: 94, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:23:59,007 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.60 vs. limit=10.0 2024-08-13 02:24:23,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1945090.0, ans=0.0 2024-08-13 02:24:25,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1945090.0, ans=0.0 2024-08-13 02:24:32,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1945190.0, ans=0.125 2024-08-13 02:24:33,666 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 02:24:39,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1945190.0, ans=0.125 2024-08-13 02:24:52,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1945290.0, ans=0.0 2024-08-13 02:24:53,728 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.564e+01 2.945e+01 3.314e+01 6.954e+01, threshold=5.890e+01, percent-clipped=1.0 2024-08-13 02:24:59,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1945290.0, ans=0.125 2024-08-13 02:25:11,188 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 02:25:13,586 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.32 vs. limit=15.0 2024-08-13 02:25:17,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1945390.0, ans=0.125 2024-08-13 02:25:21,094 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6150, loss[loss=0.09513, beats_loss=0.01087, ecapa_loss=0.000174, whisper_loss=0.08252, over 17150.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01105, ecapa_loss=0.0001678, whisper_loss=0.09122, over 3930201.99 frames. ], batch size: 71, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:25:26,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1945490.0, ans=0.125 2024-08-13 02:25:58,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1945690.0, ans=0.125 2024-08-13 02:26:03,765 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.69 vs. limit=10.0 2024-08-13 02:26:13,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2024-08-13 02:26:21,265 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-13 02:26:23,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1945790.0, ans=0.125 2024-08-13 02:26:29,590 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1945890.0, ans=0.125 2024-08-13 02:26:42,029 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6200, loss[loss=0.105, beats_loss=0.01109, ecapa_loss=0.0001786, whisper_loss=0.09213, over 20223.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01098, ecapa_loss=0.0001681, whisper_loss=0.09135, over 3908828.67 frames. ], batch size: 80, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:26:42,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1945990.0, ans=0.07 2024-08-13 02:26:46,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1945990.0, ans=0.125 2024-08-13 02:26:58,184 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 02:26:59,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1946090.0, ans=0.2 2024-08-13 02:27:00,003 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 02:27:17,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1946190.0, ans=0.125 2024-08-13 02:27:31,374 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2024-08-13 02:27:39,570 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.499e+01 2.801e+01 3.134e+01 4.474e+01, threshold=5.602e+01, percent-clipped=0.0 2024-08-13 02:27:42,834 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-13 02:27:44,082 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-13 02:27:47,256 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-13 02:28:05,151 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6250, loss[loss=0.09623, beats_loss=0.01012, ecapa_loss=0.0001593, whisper_loss=0.08452, over 22490.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01091, ecapa_loss=0.0001682, whisper_loss=0.09184, over 3915443.69 frames. ], batch size: 91, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:28:07,968 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.85 vs. limit=15.0 2024-08-13 02:28:15,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1946490.0, ans=0.0 2024-08-13 02:28:27,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1946590.0, ans=0.125 2024-08-13 02:28:39,179 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.21 vs. limit=22.5 2024-08-13 02:28:43,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1946690.0, ans=0.0 2024-08-13 02:28:48,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1946690.0, ans=0.0 2024-08-13 02:28:54,769 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 02:29:03,698 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2024-08-13 02:29:15,292 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 37 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 02:29:17,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1946890.0, ans=0.125 2024-08-13 02:29:26,987 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6300, loss[loss=0.1046, beats_loss=0.01238, ecapa_loss=0.0001826, whisper_loss=0.09041, over 21245.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01086, ecapa_loss=0.0001686, whisper_loss=0.09191, over 3888639.46 frames. ], batch size: 85, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:29:33,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1946990.0, ans=10.0 2024-08-13 02:29:35,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1946990.0, ans=0.125 2024-08-13 02:29:38,143 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.71 vs. limit=15.0 2024-08-13 02:29:54,443 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 17 from LS+wenet, 25 from Vox, 49 fro AS 2024-08-13 02:29:59,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1947190.0, ans=0.2 2024-08-13 02:29:59,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1947190.0, ans=0.125 2024-08-13 02:30:10,635 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 02:30:20,803 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.428e+01 2.719e+01 3.075e+01 5.745e+01, threshold=5.438e+01, percent-clipped=1.0 2024-08-13 02:30:28,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1947290.0, ans=0.125 2024-08-13 02:30:31,845 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-13 02:30:45,818 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6350, loss[loss=0.1168, beats_loss=0.01147, ecapa_loss=0.0001472, whisper_loss=0.1038, over 17224.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0109, ecapa_loss=0.0001688, whisper_loss=0.0914, over 3880811.95 frames. ], batch size: 64, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:31:02,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1947590.0, ans=0.125 2024-08-13 02:31:19,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1947690.0, ans=0.125 2024-08-13 02:31:25,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1947690.0, ans=0.125 2024-08-13 02:31:31,858 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 33 from Vox, 38 fro AS 2024-08-13 02:31:31,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1947690.0, ans=0.015 2024-08-13 02:31:51,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1947890.0, ans=0.125 2024-08-13 02:31:59,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1947890.0, ans=0.1 2024-08-13 02:32:07,137 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6400, loss[loss=0.1117, beats_loss=0.01204, ecapa_loss=0.0001475, whisper_loss=0.09816, over 23751.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01092, ecapa_loss=0.0001675, whisper_loss=0.09095, over 3900512.87 frames. ], batch size: 92, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:32:13,620 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 02:32:36,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1948090.0, ans=0.125 2024-08-13 02:32:49,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1948190.0, ans=0.125 2024-08-13 02:32:56,559 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 02:32:57,849 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 34 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 02:33:04,968 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.410e+01 2.725e+01 3.146e+01 5.039e+01, threshold=5.450e+01, percent-clipped=0.0 2024-08-13 02:33:15,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1948390.0, ans=0.0 2024-08-13 02:33:24,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1948390.0, ans=0.05 2024-08-13 02:33:28,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1948390.0, ans=0.1 2024-08-13 02:33:31,100 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6450, loss[loss=0.09392, beats_loss=0.01313, ecapa_loss=0.0001459, whisper_loss=0.07933, over 17726.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0109, ecapa_loss=0.0001685, whisper_loss=0.09177, over 3888966.24 frames. ], batch size: 70, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:33:39,715 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.80 vs. limit=15.0 2024-08-13 02:33:44,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1948490.0, ans=0.125 2024-08-13 02:34:31,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1948790.0, ans=0.0 2024-08-13 02:34:55,241 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6500, loss[loss=0.109, beats_loss=0.01042, ecapa_loss=0.0001874, whisper_loss=0.09667, over 22235.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01087, ecapa_loss=0.0001685, whisper_loss=0.09245, over 3897349.28 frames. ], batch size: 94, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:35:09,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1949090.0, ans=0.125 2024-08-13 02:35:19,268 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 02:35:33,958 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 23 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 02:35:37,595 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2024-08-13 02:35:38,206 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 02:35:43,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1949290.0, ans=0.2 2024-08-13 02:35:51,029 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.473e+01 2.682e+01 2.925e+01 4.435e+01, threshold=5.364e+01, percent-clipped=0.0 2024-08-13 02:35:58,931 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-13 02:36:09,406 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-13 02:36:12,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1949390.0, ans=0.125 2024-08-13 02:36:17,543 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6550, loss[loss=0.1201, beats_loss=0.009228, ecapa_loss=0.0002191, whisper_loss=0.1087, over 23241.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0109, ecapa_loss=0.0001688, whisper_loss=0.09235, over 3890473.86 frames. ], batch size: 94, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:36:32,743 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 02:36:55,221 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-08-13 02:36:56,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=1949690.0, ans=0.1 2024-08-13 02:37:01,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1949690.0, ans=0.125 2024-08-13 02:37:26,994 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.34 vs. limit=22.5 2024-08-13 02:37:41,326 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6600, loss[loss=0.1111, beats_loss=0.01067, ecapa_loss=0.0001657, whisper_loss=0.0988, over 22344.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01084, ecapa_loss=0.0001697, whisper_loss=0.0927, over 3930605.15 frames. ], batch size: 90, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:39:14,094 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.510e+01 2.757e+01 3.096e+01 4.067e+01, threshold=5.514e+01, percent-clipped=0.0 2024-08-13 02:39:18,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1950290.0, ans=10.0 2024-08-13 02:39:27,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1950390.0, ans=0.1 2024-08-13 02:39:39,265 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6650, loss[loss=0.1063, beats_loss=0.0113, ecapa_loss=0.0001901, whisper_loss=0.09305, over 19993.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01089, ecapa_loss=0.0001698, whisper_loss=0.09213, over 3934753.30 frames. ], batch size: 84, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:40:28,857 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 16 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-13 02:41:16,349 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6700, loss[loss=0.0891, beats_loss=0.009385, ecapa_loss=0.0002182, whisper_loss=0.07753, over 17149.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01086, ecapa_loss=0.0001706, whisper_loss=0.09255, over 3930513.76 frames. ], batch size: 72, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:41:21,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1950990.0, ans=0.1 2024-08-13 02:41:38,242 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 02:41:46,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1951090.0, ans=0.125 2024-08-13 02:41:48,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1951090.0, ans=0.2 2024-08-13 02:41:59,908 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 02:42:04,516 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.67 vs. limit=22.5 2024-08-13 02:42:05,261 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-13 02:42:23,734 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.167e+01 2.594e+01 2.894e+01 3.478e+01 5.381e+01, threshold=5.788e+01, percent-clipped=0.0 2024-08-13 02:42:31,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1951290.0, ans=0.0 2024-08-13 02:42:51,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1951390.0, ans=0.125 2024-08-13 02:43:00,120 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6750, loss[loss=0.1082, beats_loss=0.01012, ecapa_loss=0.0001763, whisper_loss=0.09636, over 21998.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01088, ecapa_loss=0.0001717, whisper_loss=0.09259, over 3926666.48 frames. ], batch size: 88, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:43:07,985 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 18 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 02:43:45,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1951690.0, ans=0.07 2024-08-13 02:43:48,048 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 27 from Vox, 22 fro AS 2024-08-13 02:43:50,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1951690.0, ans=0.2 2024-08-13 02:44:16,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1951790.0, ans=0.2 2024-08-13 02:44:19,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1951790.0, ans=0.125 2024-08-13 02:44:39,480 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=15.0 2024-08-13 02:44:43,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1951890.0, ans=0.1 2024-08-13 02:44:57,379 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6800, loss[loss=0.0923, beats_loss=0.01248, ecapa_loss=0.0001334, whisper_loss=0.07849, over 19698.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01088, ecapa_loss=0.0001722, whisper_loss=0.09199, over 3900657.30 frames. ], batch size: 74, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:45:08,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1951990.0, ans=0.04949747468305833 2024-08-13 02:45:47,039 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-13 02:45:57,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1952190.0, ans=0.0 2024-08-13 02:46:04,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1952190.0, ans=0.0 2024-08-13 02:46:13,256 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-08-13 02:46:16,748 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.452e+01 2.716e+01 3.076e+01 4.037e+01, threshold=5.431e+01, percent-clipped=0.0 2024-08-13 02:46:52,768 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6850, loss[loss=0.1199, beats_loss=0.009933, ecapa_loss=0.0001413, whisper_loss=0.1086, over 21932.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01092, ecapa_loss=0.0001714, whisper_loss=0.09176, over 3887157.53 frames. ], batch size: 82, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:47:04,413 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2024-08-13 02:47:27,295 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 02:47:28,851 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-13 02:48:07,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1952790.0, ans=0.125 2024-08-13 02:48:16,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1952790.0, ans=0.125 2024-08-13 02:48:18,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1952790.0, ans=0.0 2024-08-13 02:48:24,926 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.63 vs. limit=15.0 2024-08-13 02:48:30,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1952890.0, ans=0.0 2024-08-13 02:48:31,052 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 02:48:37,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1952890.0, ans=0.2 2024-08-13 02:48:42,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1952990.0, ans=0.125 2024-08-13 02:48:43,137 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6900, loss[loss=0.1058, beats_loss=0.009179, ecapa_loss=0.000175, whisper_loss=0.09485, over 17469.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01093, ecapa_loss=0.0001705, whisper_loss=0.09191, over 3896477.15 frames. ], batch size: 67, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:48:48,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1952990.0, ans=0.125 2024-08-13 02:48:50,492 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.55 vs. limit=10.0 2024-08-13 02:48:51,743 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2024-08-13 02:49:09,497 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 02:49:17,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1953190.0, ans=0.125 2024-08-13 02:49:41,578 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.589e+01 2.754e+01 3.182e+01 2.951e+02, threshold=5.508e+01, percent-clipped=1.0 2024-08-13 02:49:41,734 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 8 from Vox, 28 fro AS 2024-08-13 02:49:47,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1953290.0, ans=0.0 2024-08-13 02:49:47,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1953290.0, ans=0.2 2024-08-13 02:49:53,994 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 02:50:07,472 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 6950, loss[loss=0.1054, beats_loss=0.009902, ecapa_loss=0.0001872, whisper_loss=0.09358, over 17506.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.011, ecapa_loss=0.000169, whisper_loss=0.09153, over 3899461.22 frames. ], batch size: 70, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:50:07,641 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 26 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-13 02:50:26,644 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.40 vs. limit=22.5 2024-08-13 02:50:27,999 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 02:50:28,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1953590.0, ans=0.0 2024-08-13 02:50:43,867 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 02:50:45,858 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.627e+00 2024-08-13 02:50:47,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1953690.0, ans=0.2 2024-08-13 02:50:55,041 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-13 02:50:55,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1953690.0, ans=0.07 2024-08-13 02:51:03,219 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-13 02:51:10,963 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 02:51:17,070 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2024-08-13 02:51:18,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1953790.0, ans=0.125 2024-08-13 02:51:24,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1953890.0, ans=0.125 2024-08-13 02:51:36,412 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 02:51:42,575 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7000, loss[loss=0.09559, beats_loss=0.01161, ecapa_loss=0.0001744, whisper_loss=0.08224, over 17171.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01093, ecapa_loss=0.0001692, whisper_loss=0.0914, over 3900185.14 frames. ], batch size: 70, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:51:57,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1953990.0, ans=0.0 2024-08-13 02:52:14,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1954090.0, ans=0.125 2024-08-13 02:52:17,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1954090.0, ans=0.125 2024-08-13 02:52:17,931 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 02:52:28,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1954190.0, ans=0.0 2024-08-13 02:52:38,018 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 02:52:39,758 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 02:52:47,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1954290.0, ans=0.1 2024-08-13 02:52:48,666 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.444e+01 2.710e+01 2.918e+01 4.538e+01, threshold=5.419e+01, percent-clipped=0.0 2024-08-13 02:52:55,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1954290.0, ans=0.125 2024-08-13 02:52:59,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1954390.0, ans=0.0 2024-08-13 02:53:02,053 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 02:53:03,996 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 02:53:09,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1954390.0, ans=0.0 2024-08-13 02:53:16,204 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7050, loss[loss=0.1157, beats_loss=0.01137, ecapa_loss=0.000135, whisper_loss=0.103, over 19158.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0109, ecapa_loss=0.0001693, whisper_loss=0.09108, over 3882486.06 frames. ], batch size: 73, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:53:24,688 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=15.0 2024-08-13 02:53:43,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=12.0 2024-08-13 02:53:50,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1954590.0, ans=0.1 2024-08-13 02:54:01,995 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.29 vs. limit=12.0 2024-08-13 02:54:33,331 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 19 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 02:54:37,295 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 02:54:43,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1954890.0, ans=0.2 2024-08-13 02:54:44,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1954890.0, ans=0.125 2024-08-13 02:54:48,373 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7100, loss[loss=0.1165, beats_loss=0.008476, ecapa_loss=0.0001973, whisper_loss=0.106, over 22608.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01086, ecapa_loss=0.0001692, whisper_loss=0.09181, over 3893118.48 frames. ], batch size: 90, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:54:48,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1954990.0, ans=0.125 2024-08-13 02:54:51,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1954990.0, ans=0.0 2024-08-13 02:54:55,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1954990.0, ans=0.1 2024-08-13 02:55:04,311 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 02:55:31,406 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 02:55:38,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1955190.0, ans=0.2 2024-08-13 02:55:51,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1955290.0, ans=0.1 2024-08-13 02:55:52,220 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.441e+01 2.743e+01 3.182e+01 6.176e+01, threshold=5.486e+01, percent-clipped=2.0 2024-08-13 02:56:20,319 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7150, loss[loss=0.1014, beats_loss=0.01326, ecapa_loss=0.0002021, whisper_loss=0.08607, over 20429.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01086, ecapa_loss=0.000169, whisper_loss=0.09221, over 3899846.46 frames. ], batch size: 82, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:56:27,777 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 02:56:30,666 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.05 vs. limit=22.5 2024-08-13 02:56:40,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1955590.0, ans=0.0 2024-08-13 02:56:57,316 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 17 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-13 02:57:12,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1955690.0, ans=0.125 2024-08-13 02:57:16,684 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 02:57:38,219 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 02:57:52,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1955990.0, ans=0.0 2024-08-13 02:57:53,405 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7200, loss[loss=0.08487, beats_loss=0.01343, ecapa_loss=0.000138, whisper_loss=0.07006, over 16911.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01091, ecapa_loss=0.0001682, whisper_loss=0.09219, over 3889905.77 frames. ], batch size: 66, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:58:09,289 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.25 vs. limit=10.0 2024-08-13 02:58:14,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1956090.0, ans=0.125 2024-08-13 02:58:19,452 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 02:58:21,839 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2024-08-13 02:58:23,525 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2024-08-13 02:58:38,939 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 02:58:40,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1956190.0, ans=0.0 2024-08-13 02:58:48,102 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=15.0 2024-08-13 02:58:56,452 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.408e+01 2.678e+01 2.996e+01 6.633e+01, threshold=5.357e+01, percent-clipped=2.0 2024-08-13 02:58:56,576 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 27 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-13 02:59:11,998 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.97 vs. limit=15.0 2024-08-13 02:59:15,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1956390.0, ans=0.125 2024-08-13 02:59:23,715 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7250, loss[loss=0.1027, beats_loss=0.01025, ecapa_loss=0.0001745, whisper_loss=0.0907, over 22671.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0109, ecapa_loss=0.0001682, whisper_loss=0.09193, over 3895773.43 frames. ], batch size: 90, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:59:26,938 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 02:59:28,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1956490.0, ans=0.125 2024-08-13 02:59:50,301 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 32 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 03:00:04,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1956690.0, ans=0.125 2024-08-13 03:00:07,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1956690.0, ans=0.125 2024-08-13 03:00:21,875 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-13 03:00:36,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1956890.0, ans=0.0 2024-08-13 03:00:41,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1956890.0, ans=0.1 2024-08-13 03:00:53,093 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7300, loss[loss=0.09779, beats_loss=0.012, ecapa_loss=0.0001515, whisper_loss=0.08427, over 15173.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01091, ecapa_loss=0.000167, whisper_loss=0.09213, over 3888714.43 frames. ], batch size: 59, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:00:58,552 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 03:01:13,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1957090.0, ans=0.0 2024-08-13 03:01:13,542 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2024-08-13 03:01:15,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1957090.0, ans=0.125 2024-08-13 03:01:19,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1957090.0, ans=0.0 2024-08-13 03:01:20,560 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 32 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-13 03:01:43,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1957190.0, ans=0.125 2024-08-13 03:01:51,357 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2024-08-13 03:01:52,085 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 03:01:55,241 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.448e+01 2.774e+01 3.121e+01 5.439e+01, threshold=5.548e+01, percent-clipped=1.0 2024-08-13 03:02:01,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1957290.0, ans=0.09899494936611666 2024-08-13 03:02:07,163 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-13 03:02:14,253 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.232e+00 2024-08-13 03:02:20,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1957490.0, ans=0.125 2024-08-13 03:02:21,003 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7350, loss[loss=0.08033, beats_loss=0.01151, ecapa_loss=0.0001949, whisper_loss=0.06687, over 15195.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01093, ecapa_loss=0.0001673, whisper_loss=0.09191, over 3920466.28 frames. ], batch size: 67, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:02:41,076 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 03:02:43,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1957590.0, ans=0.125 2024-08-13 03:03:10,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1957690.0, ans=0.0 2024-08-13 03:03:31,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1957890.0, ans=0.125 2024-08-13 03:03:34,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1957890.0, ans=0.125 2024-08-13 03:03:45,629 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7400, loss[loss=0.0924, beats_loss=0.01265, ecapa_loss=0.0001903, whisper_loss=0.07785, over 22486.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01097, ecapa_loss=0.0001675, whisper_loss=0.09131, over 3896420.44 frames. ], batch size: 97, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:03:46,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1957990.0, ans=0.1 2024-08-13 03:03:51,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1957990.0, ans=0.0 2024-08-13 03:04:27,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1958190.0, ans=0.125 2024-08-13 03:04:30,926 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 03:04:43,999 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.515e+01 2.775e+01 3.372e+01 5.725e+01, threshold=5.550e+01, percent-clipped=1.0 2024-08-13 03:05:02,595 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 03:05:03,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1958390.0, ans=0.125 2024-08-13 03:05:09,141 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7450, loss[loss=0.1067, beats_loss=0.009166, ecapa_loss=0.0001568, whisper_loss=0.09594, over 21165.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01093, ecapa_loss=0.0001686, whisper_loss=0.09137, over 3893987.26 frames. ], batch size: 81, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:05:20,331 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.02 vs. limit=6.0 2024-08-13 03:05:22,391 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 16 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 03:05:33,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=1958590.0, ans=15.0 2024-08-13 03:05:49,860 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 03:05:53,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1958690.0, ans=0.125 2024-08-13 03:06:31,496 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7500, loss[loss=0.1186, beats_loss=0.01024, ecapa_loss=0.0001461, whisper_loss=0.1069, over 23671.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01092, ecapa_loss=0.0001678, whisper_loss=0.09131, over 3883118.64 frames. ], batch size: 93, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:06:37,294 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 03:06:50,201 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-13 03:07:00,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1959090.0, ans=0.125 2024-08-13 03:07:13,785 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.27 vs. limit=15.0 2024-08-13 03:07:23,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1959290.0, ans=0.125 2024-08-13 03:07:28,719 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.459e+01 2.697e+01 3.000e+01 4.880e+01, threshold=5.394e+01, percent-clipped=0.0 2024-08-13 03:07:52,941 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7550, loss[loss=0.1097, beats_loss=0.009934, ecapa_loss=0.0001583, whisper_loss=0.09818, over 18072.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01093, ecapa_loss=0.0001686, whisper_loss=0.09076, over 3857110.32 frames. ], batch size: 71, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:07:58,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1959490.0, ans=0.0 2024-08-13 03:08:58,610 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 03:09:06,152 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 03:09:11,671 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7600, loss[loss=0.1114, beats_loss=0.007171, ecapa_loss=0.000227, whisper_loss=0.102, over 15986.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01085, ecapa_loss=0.0001704, whisper_loss=0.09144, over 3871743.44 frames. ], batch size: 62, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:09:27,737 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 03:09:40,094 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-13 03:10:08,252 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.552e+01 2.815e+01 3.111e+01 1.865e+02, threshold=5.629e+01, percent-clipped=3.0 2024-08-13 03:10:08,455 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 39 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 03:10:29,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1960390.0, ans=0.125 2024-08-13 03:10:32,103 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7650, loss[loss=0.1255, beats_loss=0.008294, ecapa_loss=0.0002094, whisper_loss=0.1151, over 22621.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01081, ecapa_loss=0.0001704, whisper_loss=0.09145, over 3880296.06 frames. ], batch size: 89, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:10:43,553 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 03:11:09,324 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 03:11:11,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1960690.0, ans=0.125 2024-08-13 03:11:12,941 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-13 03:11:16,466 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-13 03:11:26,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1960790.0, ans=0.125 2024-08-13 03:11:46,019 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-13 03:11:50,484 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7700, loss[loss=0.1095, beats_loss=0.01152, ecapa_loss=0.000165, whisper_loss=0.09634, over 19894.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01079, ecapa_loss=0.0001695, whisper_loss=0.09145, over 3874432.91 frames. ], batch size: 81, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:11:52,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1960990.0, ans=0.0 2024-08-13 03:12:05,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1961090.0, ans=0.125 2024-08-13 03:12:07,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1961090.0, ans=0.2 2024-08-13 03:12:07,708 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.04 vs. limit=22.5 2024-08-13 03:12:11,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1961090.0, ans=0.125 2024-08-13 03:12:44,226 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.500e+01 2.815e+01 3.285e+01 6.862e+01, threshold=5.629e+01, percent-clipped=1.0 2024-08-13 03:12:49,628 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 30 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 03:13:08,356 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7750, loss[loss=0.09131, beats_loss=0.01024, ecapa_loss=0.0001949, whisper_loss=0.07913, over 18161.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01086, ecapa_loss=0.0001687, whisper_loss=0.09093, over 3883901.61 frames. ], batch size: 76, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:13:10,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1961490.0, ans=0.025 2024-08-13 03:13:10,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.44 vs. limit=15.0 2024-08-13 03:13:10,865 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 27 from LS+wenet, 34 from Vox, 34 fro AS 2024-08-13 03:13:22,015 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 03:13:37,222 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-13 03:13:37,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1961590.0, ans=0.0 2024-08-13 03:14:00,514 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.62 vs. limit=15.0 2024-08-13 03:14:25,598 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7800, loss[loss=0.1016, beats_loss=0.01021, ecapa_loss=0.0001658, whisper_loss=0.08978, over 14688.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01087, ecapa_loss=0.0001693, whisper_loss=0.09009, over 3854194.86 frames. ], batch size: 57, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:14:43,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1962090.0, ans=0.5 2024-08-13 03:14:45,643 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 03:14:49,151 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 03:15:19,798 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.763e+01 2.419e+01 2.661e+01 3.121e+01 6.090e+01, threshold=5.321e+01, percent-clipped=1.0 2024-08-13 03:15:23,903 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2024-08-13 03:15:40,418 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 03:15:42,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1962490.0, ans=0.04949747468305833 2024-08-13 03:15:43,418 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7850, loss[loss=0.1166, beats_loss=0.009897, ecapa_loss=0.0001456, whisper_loss=0.1053, over 18362.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01092, ecapa_loss=0.0001687, whisper_loss=0.09013, over 3844060.42 frames. ], batch size: 71, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:15:45,361 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 03:16:23,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1962690.0, ans=0.0 2024-08-13 03:16:40,921 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 03:16:44,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1962890.0, ans=0.0 2024-08-13 03:16:54,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1962890.0, ans=0.125 2024-08-13 03:16:54,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1962890.0, ans=0.125 2024-08-13 03:16:59,995 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7900, loss[loss=0.1167, beats_loss=0.008397, ecapa_loss=0.0001474, whisper_loss=0.1068, over 20603.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0109, ecapa_loss=0.0001679, whisper_loss=0.09089, over 3854715.70 frames. ], batch size: 76, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:17:05,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1962990.0, ans=0.125 2024-08-13 03:17:22,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1963090.0, ans=0.125 2024-08-13 03:17:24,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1963090.0, ans=0.1 2024-08-13 03:17:52,576 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.438e+01 2.739e+01 3.083e+01 5.244e+01, threshold=5.477e+01, percent-clipped=0.0 2024-08-13 03:18:14,315 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 7950, loss[loss=0.1066, beats_loss=0.01051, ecapa_loss=0.0001758, whisper_loss=0.09432, over 18491.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01086, ecapa_loss=0.0001681, whisper_loss=0.09114, over 3851571.12 frames. ], batch size: 69, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:18:32,533 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 16 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-13 03:19:13,578 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 03:19:22,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1963890.0, ans=0.0 2024-08-13 03:19:28,751 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8000, loss[loss=0.1035, beats_loss=0.01189, ecapa_loss=0.0001517, whisper_loss=0.09008, over 23697.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01085, ecapa_loss=0.0001678, whisper_loss=0.09129, over 3853000.91 frames. ], batch size: 94, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:19:31,340 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=12.0 2024-08-13 03:19:47,742 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.10 vs. limit=10.0 2024-08-13 03:19:56,993 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 03:20:02,309 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 03:20:06,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1964190.0, ans=0.0 2024-08-13 03:20:21,307 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.304e+01 2.712e+01 2.987e+01 5.432e+01, threshold=5.425e+01, percent-clipped=0.0 2024-08-13 03:20:21,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1964290.0, ans=0.2 2024-08-13 03:20:24,188 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 11 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 03:20:25,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1964290.0, ans=0.125 2024-08-13 03:20:40,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1964390.0, ans=0.125 2024-08-13 03:20:40,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1964390.0, ans=0.125 2024-08-13 03:20:42,394 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8050, loss[loss=0.0965, beats_loss=0.007717, ecapa_loss=0.0001832, whisper_loss=0.08696, over 14580.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0109, ecapa_loss=0.0001677, whisper_loss=0.0914, over 3830805.15 frames. ], batch size: 54, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:21:06,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1964590.0, ans=0.5 2024-08-13 03:21:08,558 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 03:21:23,177 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-13 03:21:30,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1964790.0, ans=0.125 2024-08-13 03:21:51,937 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8100, loss[loss=0.09968, beats_loss=0.01306, ecapa_loss=0.000145, whisper_loss=0.08518, over 22332.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0109, ecapa_loss=0.0001673, whisper_loss=0.09158, over 3837948.04 frames. ], batch size: 91, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:21:55,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1964990.0, ans=0.125 2024-08-13 03:22:08,054 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.283e-01 2024-08-13 03:22:09,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1965090.0, ans=0.125 2024-08-13 03:22:10,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1965090.0, ans=0.125 2024-08-13 03:22:13,529 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.33 vs. limit=22.5 2024-08-13 03:22:15,447 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 03:22:23,689 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.44 vs. limit=15.0 2024-08-13 03:22:26,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.15 vs. limit=22.5 2024-08-13 03:22:30,783 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-13 03:22:39,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1965290.0, ans=0.5 2024-08-13 03:22:39,911 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.433e+01 2.725e+01 3.019e+01 1.220e+02, threshold=5.449e+01, percent-clipped=1.0 2024-08-13 03:22:46,206 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 03:22:53,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1965390.0, ans=0.0 2024-08-13 03:22:56,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1965390.0, ans=0.125 2024-08-13 03:23:01,248 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8150, loss[loss=0.05764, beats_loss=0.01658, ecapa_loss=0.000188, whisper_loss=0.03918, over 16224.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01092, ecapa_loss=0.0001665, whisper_loss=0.09109, over 3827884.86 frames. ], batch size: 74, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:23:13,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1965490.0, ans=0.2 2024-08-13 03:23:38,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1965690.0, ans=0.125 2024-08-13 03:23:48,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1965790.0, ans=0.125 2024-08-13 03:23:58,115 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 03:24:10,431 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8200, loss[loss=0.09712, beats_loss=0.01255, ecapa_loss=0.0001491, whisper_loss=0.08307, over 21829.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01091, ecapa_loss=0.0001673, whisper_loss=0.09115, over 3869476.99 frames. ], batch size: 90, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:24:25,548 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-13 03:24:29,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1966090.0, ans=0.0 2024-08-13 03:24:35,626 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 03:24:44,505 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 03:24:51,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1966290.0, ans=0.125 2024-08-13 03:24:58,605 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.561e+01 2.768e+01 3.091e+01 7.365e+01, threshold=5.537e+01, percent-clipped=2.0 2024-08-13 03:25:14,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1966390.0, ans=0.0 2024-08-13 03:25:17,967 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 03:25:18,775 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.21 vs. limit=10.0 2024-08-13 03:25:19,129 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8250, loss[loss=0.09306, beats_loss=0.01245, ecapa_loss=0.0001832, whisper_loss=0.07877, over 18776.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01095, ecapa_loss=0.0001671, whisper_loss=0.09144, over 3884305.50 frames. ], batch size: 80, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:25:19,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1966490.0, ans=0.125 2024-08-13 03:25:21,764 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 14 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 03:25:43,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1966590.0, ans=0.125 2024-08-13 03:25:46,911 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-13 03:26:11,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1966890.0, ans=0.125 2024-08-13 03:26:13,609 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 32 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 03:26:14,093 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.624e+00 2024-08-13 03:26:25,398 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8300, loss[loss=0.1152, beats_loss=0.009675, ecapa_loss=0.000146, whisper_loss=0.1041, over 14389.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01089, ecapa_loss=0.0001677, whisper_loss=0.0917, over 3904116.07 frames. ], batch size: 54, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:26:34,727 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 03:26:45,407 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 03:26:54,826 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 03:26:56,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1967190.0, ans=0.125 2024-08-13 03:27:12,897 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.397e+01 2.699e+01 2.951e+01 6.635e+01, threshold=5.397e+01, percent-clipped=2.0 2024-08-13 03:27:14,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1967290.0, ans=0.125 2024-08-13 03:27:28,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1967390.0, ans=0.125 2024-08-13 03:27:33,317 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8350, loss[loss=0.128, beats_loss=0.00797, ecapa_loss=0.0001885, whisper_loss=0.1181, over 22951.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01085, ecapa_loss=0.0001669, whisper_loss=0.09231, over 3918665.70 frames. ], batch size: 89, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:28:00,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1967690.0, ans=0.2 2024-08-13 03:28:20,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1967790.0, ans=0.2 2024-08-13 03:28:22,738 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 03:28:29,945 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-13 03:28:32,116 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2024-08-13 03:28:42,350 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8400, loss[loss=0.1161, beats_loss=0.01039, ecapa_loss=0.0001845, whisper_loss=0.1038, over 22700.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01083, ecapa_loss=0.0001688, whisper_loss=0.09285, over 3943596.42 frames. ], batch size: 92, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:28:44,737 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.64 vs. limit=22.5 2024-08-13 03:29:00,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1968090.0, ans=0.125 2024-08-13 03:29:03,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1968090.0, ans=0.125 2024-08-13 03:29:30,995 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.441e+01 2.759e+01 3.099e+01 1.310e+02, threshold=5.518e+01, percent-clipped=1.0 2024-08-13 03:29:45,039 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 03:29:48,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1968390.0, ans=0.2 2024-08-13 03:29:51,589 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8450, loss[loss=0.1148, beats_loss=0.01064, ecapa_loss=0.0001623, whisper_loss=0.1025, over 21951.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01081, ecapa_loss=0.0001688, whisper_loss=0.093, over 3942417.77 frames. ], batch size: 89, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:30:01,537 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 03:30:07,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1968590.0, ans=0.125 2024-08-13 03:30:17,838 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 03:30:28,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1968690.0, ans=0.125 2024-08-13 03:30:35,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1968790.0, ans=0.125 2024-08-13 03:30:36,002 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 03:30:54,913 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 03:30:59,954 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8500, loss[loss=0.1137, beats_loss=0.0108, ecapa_loss=0.0001361, whisper_loss=0.1015, over 23913.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.0108, ecapa_loss=0.0001685, whisper_loss=0.09303, over 3944382.39 frames. ], batch size: 93, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:31:00,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1968990.0, ans=0.125 2024-08-13 03:31:06,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1968990.0, ans=15.0 2024-08-13 03:31:14,168 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 03:31:19,298 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-13 03:31:48,046 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.417e+01 2.734e+01 3.054e+01 8.886e+01, threshold=5.467e+01, percent-clipped=1.0 2024-08-13 03:32:08,467 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8550, loss[loss=0.1182, beats_loss=0.009237, ecapa_loss=0.0002093, whisper_loss=0.1069, over 22120.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01079, ecapa_loss=0.0001681, whisper_loss=0.09278, over 3919905.93 frames. ], batch size: 90, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:32:15,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1969490.0, ans=0.125 2024-08-13 03:32:15,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1969490.0, ans=0.1 2024-08-13 03:32:19,600 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 03:32:27,908 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 03:32:44,251 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-13 03:32:53,786 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 11 from Vox, 48 fro AS 2024-08-13 03:32:58,182 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-13 03:33:12,876 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-13 03:33:14,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1969890.0, ans=0.125 2024-08-13 03:33:16,971 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8600, loss[loss=0.1048, beats_loss=0.01, ecapa_loss=0.000228, whisper_loss=0.09256, over 16036.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01084, ecapa_loss=0.0001685, whisper_loss=0.09195, over 3880479.96 frames. ], batch size: 67, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:33:23,848 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=15.0 2024-08-13 03:33:27,861 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.87 vs. limit=22.5 2024-08-13 03:33:34,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1970090.0, ans=0.2 2024-08-13 03:33:37,904 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=18.14 vs. limit=15.0 2024-08-13 03:33:40,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1970090.0, ans=0.0 2024-08-13 03:34:04,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1970290.0, ans=0.2 2024-08-13 03:34:06,837 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.434e+01 2.755e+01 2.994e+01 8.345e+01, threshold=5.511e+01, percent-clipped=1.0 2024-08-13 03:34:08,187 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2024-08-13 03:34:09,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1970290.0, ans=0.125 2024-08-13 03:34:28,469 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8650, loss[loss=0.1099, beats_loss=0.008494, ecapa_loss=0.0001517, whisper_loss=0.09985, over 17117.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01087, ecapa_loss=0.0001679, whisper_loss=0.09219, over 3902724.60 frames. ], batch size: 65, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:34:39,658 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=15.0 2024-08-13 03:34:41,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1970490.0, ans=0.125 2024-08-13 03:34:45,407 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-13 03:34:47,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1970590.0, ans=0.125 2024-08-13 03:35:08,760 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-13 03:35:09,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1970690.0, ans=0.125 2024-08-13 03:35:16,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1970790.0, ans=0.125 2024-08-13 03:35:18,665 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=22.5 2024-08-13 03:35:21,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1970790.0, ans=0.2 2024-08-13 03:35:39,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1970890.0, ans=0.125 2024-08-13 03:35:40,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1970890.0, ans=0.125 2024-08-13 03:35:44,084 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8700, loss[loss=0.1092, beats_loss=0.01287, ecapa_loss=0.0001325, whisper_loss=0.09498, over 19292.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01094, ecapa_loss=0.0001681, whisper_loss=0.0918, over 3904036.30 frames. ], batch size: 74, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:35:45,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1970990.0, ans=0.125 2024-08-13 03:35:51,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1970990.0, ans=10.0 2024-08-13 03:35:54,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1970990.0, ans=0.2 2024-08-13 03:36:02,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1971090.0, ans=0.125 2024-08-13 03:36:09,288 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-13 03:36:11,327 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 03:36:12,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1971090.0, ans=0.125 2024-08-13 03:36:16,174 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-13 03:36:40,233 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.522e+01 2.761e+01 3.315e+01 1.069e+02, threshold=5.521e+01, percent-clipped=2.0 2024-08-13 03:36:44,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1971290.0, ans=0.025 2024-08-13 03:36:49,610 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2024-08-13 03:37:05,252 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8750, loss[loss=0.09009, beats_loss=0.009276, ecapa_loss=0.000197, whisper_loss=0.07884, over 20054.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01084, ecapa_loss=0.0001703, whisper_loss=0.09205, over 3911562.37 frames. ], batch size: 86, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:37:27,692 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 03:37:51,284 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2024-08-13 03:37:53,113 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-13 03:37:54,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1971790.0, ans=0.125 2024-08-13 03:38:12,028 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 03:38:20,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1971890.0, ans=0.0 2024-08-13 03:38:21,648 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 31 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 03:38:24,587 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8800, loss[loss=0.09462, beats_loss=0.009476, ecapa_loss=0.0002043, whisper_loss=0.0831, over 13313.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01086, ecapa_loss=0.0001702, whisper_loss=0.09259, over 3915508.32 frames. ], batch size: 54, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:38:26,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1971990.0, ans=0.2 2024-08-13 03:38:36,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1971990.0, ans=0.125 2024-08-13 03:38:39,277 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 26 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-13 03:38:51,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1972090.0, ans=0.0 2024-08-13 03:38:56,672 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.51 vs. limit=22.5 2024-08-13 03:39:14,200 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 03:39:23,242 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.400e+01 2.713e+01 2.983e+01 4.963e+01, threshold=5.426e+01, percent-clipped=0.0 2024-08-13 03:39:23,388 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 03:39:37,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1972390.0, ans=0.125 2024-08-13 03:39:40,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1972390.0, ans=0.125 2024-08-13 03:39:46,401 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8850, loss[loss=0.1032, beats_loss=0.009167, ecapa_loss=0.000177, whisper_loss=0.09229, over 17991.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01091, ecapa_loss=0.000169, whisper_loss=0.09247, over 3921134.63 frames. ], batch size: 72, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:39:46,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1972490.0, ans=0.125 2024-08-13 03:40:26,595 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-13 03:41:08,064 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8900, loss[loss=0.07854, beats_loss=0.01105, ecapa_loss=0.0001449, whisper_loss=0.06605, over 17253.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01096, ecapa_loss=0.0001672, whisper_loss=0.09244, over 3903704.31 frames. ], batch size: 65, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:41:10,446 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 03:41:18,139 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.732e+05 2024-08-13 03:41:27,988 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2024-08-13 03:41:29,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1973090.0, ans=0.1 2024-08-13 03:42:04,448 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 03:42:05,589 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.456e+01 2.768e+01 3.242e+01 5.170e+01, threshold=5.536e+01, percent-clipped=0.0 2024-08-13 03:42:14,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1973390.0, ans=0.1 2024-08-13 03:42:20,065 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=15.0 2024-08-13 03:42:24,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1973390.0, ans=0.0 2024-08-13 03:42:29,739 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 8950, loss[loss=0.1261, beats_loss=0.01028, ecapa_loss=0.0001548, whisper_loss=0.1143, over 15052.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01088, ecapa_loss=0.0001675, whisper_loss=0.09331, over 3901991.33 frames. ], batch size: 56, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:42:45,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1973590.0, ans=0.0 2024-08-13 03:42:47,528 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 03:43:09,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1973690.0, ans=0.1 2024-08-13 03:43:17,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1973790.0, ans=0.0 2024-08-13 03:43:32,417 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2024-08-13 03:43:44,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1973890.0, ans=0.125 2024-08-13 03:43:47,934 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9000, loss[loss=0.1052, beats_loss=0.009654, ecapa_loss=0.0001844, whisper_loss=0.09371, over 16213.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.0109, ecapa_loss=0.000167, whisper_loss=0.09295, over 3937378.81 frames. ], batch size: 64, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:43:47,934 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-13 03:44:28,195 INFO [train_multi_KD3.py:1149] (2/4) Epoch 14, validation on ASR_libri: loss=0.2542, beats_loss=0, ecapa_loss=0.0005752, whisper_loss=0.2484, over 922467.00 frames. 2024-08-13 03:44:46,382 INFO [train_multi_KD3.py:1149] (2/4) Epoch 14, validation on SV_voxceleb1: loss=0.004584, beats_loss=0, ecapa_loss=0.0004584, whisper_loss=0, over 939242.00 frames. 2024-08-13 03:46:42,147 INFO [train_multi_KD3.py:1149] (2/4) Epoch 14, validation on AT_audioset: loss=0.02386, beats_loss=0.02386, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 03:46:42,151 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-13 03:46:45,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1973990.0, ans=0.0 2024-08-13 03:46:51,452 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-08-13 03:47:36,938 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 33 from LS+wenet, 21 from Vox, 16 fro AS 2024-08-13 03:47:38,401 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.29 vs. limit=10.0 2024-08-13 03:47:41,781 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.559e+01 2.793e+01 3.240e+01 5.167e+01, threshold=5.585e+01, percent-clipped=0.0 2024-08-13 03:47:49,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1974390.0, ans=0.125 2024-08-13 03:48:03,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1974390.0, ans=0.05 2024-08-13 03:48:05,893 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 03:48:07,093 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9050, loss[loss=0.1013, beats_loss=0.01141, ecapa_loss=0.0001627, whisper_loss=0.08825, over 22232.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01085, ecapa_loss=0.0001682, whisper_loss=0.09343, over 3931970.31 frames. ], batch size: 89, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:48:15,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1974490.0, ans=0.2 2024-08-13 03:48:23,959 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 03:48:33,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1974590.0, ans=0.0 2024-08-13 03:48:57,776 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 03:49:07,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1974790.0, ans=0.125 2024-08-13 03:49:28,248 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9100, loss[loss=0.1084, beats_loss=0.01191, ecapa_loss=0.0001878, whisper_loss=0.09466, over 22251.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01087, ecapa_loss=0.0001687, whisper_loss=0.09294, over 3949529.22 frames. ], batch size: 91, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:49:39,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1974990.0, ans=0.0 2024-08-13 03:49:41,663 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.24 vs. limit=6.0 2024-08-13 03:50:11,175 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.12 vs. limit=10.0 2024-08-13 03:50:16,078 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=15.0 2024-08-13 03:50:26,268 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.465e+01 2.794e+01 3.182e+01 5.687e+01, threshold=5.588e+01, percent-clipped=1.0 2024-08-13 03:50:33,871 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 03:50:43,103 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 38 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-13 03:50:50,105 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-13 03:50:52,136 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9150, loss[loss=0.1002, beats_loss=0.01106, ecapa_loss=0.0001521, whisper_loss=0.08762, over 20436.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01091, ecapa_loss=0.0001683, whisper_loss=0.09253, over 3972866.81 frames. ], batch size: 79, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:50:53,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1975490.0, ans=0.5 2024-08-13 03:50:56,326 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 03:51:03,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1975490.0, ans=0.05 2024-08-13 03:51:04,292 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 03:51:21,544 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-13 03:51:32,258 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-13 03:51:50,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2024-08-13 03:51:59,816 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 03:52:05,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1975890.0, ans=0.0 2024-08-13 03:52:06,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1975890.0, ans=0.2 2024-08-13 03:52:08,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1975890.0, ans=0.125 2024-08-13 03:52:13,412 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9200, loss[loss=0.1049, beats_loss=0.01078, ecapa_loss=0.0001879, whisper_loss=0.09228, over 16012.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01095, ecapa_loss=0.0001682, whisper_loss=0.0919, over 3965168.43 frames. ], batch size: 66, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:52:21,488 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-13 03:52:35,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1976090.0, ans=0.125 2024-08-13 03:52:41,529 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 03:52:56,197 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-13 03:53:08,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1976290.0, ans=0.0 2024-08-13 03:53:09,973 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 03:53:10,901 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.16 vs. limit=6.0 2024-08-13 03:53:11,032 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.444e+01 2.723e+01 3.266e+01 6.783e+01, threshold=5.446e+01, percent-clipped=1.0 2024-08-13 03:53:28,511 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.377e+05 2024-08-13 03:53:32,646 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9250, loss[loss=0.1153, beats_loss=0.0104, ecapa_loss=0.0001432, whisper_loss=0.1035, over 23985.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01091, ecapa_loss=0.0001688, whisper_loss=0.09196, over 3951545.99 frames. ], batch size: 90, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:53:37,830 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 26 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-13 03:54:12,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1976690.0, ans=0.04949747468305833 2024-08-13 03:54:40,642 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.29 vs. limit=15.0 2024-08-13 03:54:54,747 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9300, loss[loss=0.09638, beats_loss=0.01247, ecapa_loss=0.0001559, whisper_loss=0.08236, over 21853.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01093, ecapa_loss=0.0001679, whisper_loss=0.0923, over 3968461.98 frames. ], batch size: 89, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:54:55,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1976990.0, ans=0.125 2024-08-13 03:55:00,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1976990.0, ans=0.125 2024-08-13 03:55:42,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1977190.0, ans=10.0 2024-08-13 03:55:54,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1977290.0, ans=0.125 2024-08-13 03:55:55,404 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.460e+01 2.642e+01 2.957e+01 1.771e+02, threshold=5.283e+01, percent-clipped=2.0 2024-08-13 03:56:02,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1977390.0, ans=0.125 2024-08-13 03:56:09,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1977390.0, ans=0.125 2024-08-13 03:56:18,760 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9350, loss[loss=0.1148, beats_loss=0.009402, ecapa_loss=0.0001881, whisper_loss=0.1035, over 22444.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01094, ecapa_loss=0.0001672, whisper_loss=0.09218, over 3930938.84 frames. ], batch size: 91, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:56:20,770 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 03:56:22,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1977490.0, ans=0.125 2024-08-13 03:56:26,723 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-13 03:56:51,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1977690.0, ans=0.95 2024-08-13 03:57:00,023 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-13 03:57:02,047 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-13 03:57:03,401 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 03:57:21,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1977890.0, ans=0.125 2024-08-13 03:57:23,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1977890.0, ans=0.0 2024-08-13 03:57:23,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1977890.0, ans=0.0 2024-08-13 03:57:25,610 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-13 03:57:34,239 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.11 vs. limit=15.0 2024-08-13 03:57:36,849 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2024-08-13 03:57:38,653 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9400, loss[loss=0.1186, beats_loss=0.007282, ecapa_loss=0.0001525, whisper_loss=0.1098, over 17617.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01082, ecapa_loss=0.0001682, whisper_loss=0.09198, over 3895402.20 frames. ], batch size: 64, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:57:44,562 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2024-08-13 03:58:04,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1978090.0, ans=0.1 2024-08-13 03:58:08,062 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2024-08-13 03:58:33,731 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 03:58:34,475 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2024-08-13 03:58:34,861 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.425e+01 2.641e+01 3.063e+01 7.732e+01, threshold=5.282e+01, percent-clipped=1.0 2024-08-13 03:58:43,079 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 03:58:44,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1978390.0, ans=0.125 2024-08-13 03:58:47,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1978390.0, ans=0.0 2024-08-13 03:58:57,127 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9450, loss[loss=0.1323, beats_loss=0.008003, ecapa_loss=0.0001602, whisper_loss=0.1227, over 18470.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.0001671, whisper_loss=0.09168, over 3914870.93 frames. ], batch size: 71, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:59:11,840 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 03:59:17,745 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 03:59:20,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1978590.0, ans=0.0 2024-08-13 03:59:22,756 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 20 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-13 03:59:25,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1978590.0, ans=0.2 2024-08-13 03:59:41,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1978690.0, ans=0.0 2024-08-13 03:59:57,305 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 03:59:59,136 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-13 04:00:08,280 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2024-08-13 04:00:17,076 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9500, loss[loss=0.1029, beats_loss=0.01026, ecapa_loss=0.0001752, whisper_loss=0.09085, over 22687.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0109, ecapa_loss=0.000166, whisper_loss=0.09153, over 3932281.36 frames. ], batch size: 90, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:00:27,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1978990.0, ans=0.1 2024-08-13 04:00:38,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1979090.0, ans=0.1 2024-08-13 04:00:44,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1979090.0, ans=0.125 2024-08-13 04:00:44,467 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.17 vs. limit=12.0 2024-08-13 04:01:09,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1979290.0, ans=10.0 2024-08-13 04:01:09,969 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 04:01:12,845 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.497e+01 2.737e+01 3.144e+01 1.195e+02, threshold=5.474e+01, percent-clipped=3.0 2024-08-13 04:01:31,592 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-08-13 04:01:33,963 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9550, loss[loss=0.1012, beats_loss=0.0101, ecapa_loss=0.0001822, whisper_loss=0.08928, over 22010.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01082, ecapa_loss=0.0001662, whisper_loss=0.09167, over 3914888.33 frames. ], batch size: 91, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:01:40,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1979490.0, ans=0.125 2024-08-13 04:01:52,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1979590.0, ans=0.1 2024-08-13 04:02:12,134 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.27 vs. limit=10.0 2024-08-13 04:02:17,337 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.786e-01 2024-08-13 04:02:19,367 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.46 vs. limit=22.5 2024-08-13 04:02:34,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1979890.0, ans=0.125 2024-08-13 04:02:39,747 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-13 04:02:46,649 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9600, loss[loss=0.08997, beats_loss=0.01203, ecapa_loss=0.0001415, whisper_loss=0.07653, over 16774.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01088, ecapa_loss=0.0001657, whisper_loss=0.09084, over 3894837.38 frames. ], batch size: 66, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:02:46,807 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-13 04:02:47,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1979990.0, ans=0.1 2024-08-13 04:02:54,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1979990.0, ans=10.0 2024-08-13 04:03:04,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1980090.0, ans=0.125 2024-08-13 04:03:17,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1980190.0, ans=0.125 2024-08-13 04:03:22,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1980190.0, ans=0.2 2024-08-13 04:03:25,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1980190.0, ans=0.125 2024-08-13 04:03:31,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1980290.0, ans=0.0 2024-08-13 04:03:33,796 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 04:03:36,222 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.597e+01 2.785e+01 3.117e+01 4.817e+01, threshold=5.569e+01, percent-clipped=0.0 2024-08-13 04:03:47,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1980390.0, ans=0.2 2024-08-13 04:03:51,994 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2024-08-13 04:03:53,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1980390.0, ans=0.1 2024-08-13 04:03:55,296 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9650, loss[loss=0.1171, beats_loss=0.008514, ecapa_loss=0.0001993, whisper_loss=0.1066, over 16367.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01083, ecapa_loss=0.0001663, whisper_loss=0.09128, over 3884693.58 frames. ], batch size: 63, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:03:55,463 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-13 04:03:55,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1980490.0, ans=0.125 2024-08-13 04:03:59,602 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-13 04:04:08,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1980590.0, ans=0.2 2024-08-13 04:04:12,627 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 20 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-13 04:04:15,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1980590.0, ans=0.025 2024-08-13 04:04:19,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1980590.0, ans=0.125 2024-08-13 04:04:26,450 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 04:04:36,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1980790.0, ans=0.125 2024-08-13 04:04:44,402 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-13 04:04:55,733 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-13 04:05:03,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1980890.0, ans=0.125 2024-08-13 04:05:05,405 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9700, loss[loss=0.1116, beats_loss=0.008658, ecapa_loss=0.0001705, whisper_loss=0.1012, over 21036.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0108, ecapa_loss=0.0001681, whisper_loss=0.09128, over 3868936.53 frames. ], batch size: 81, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:05:34,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1981190.0, ans=0.125 2024-08-13 04:05:55,608 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.457e+01 2.661e+01 2.979e+01 4.854e+01, threshold=5.323e+01, percent-clipped=0.0 2024-08-13 04:05:58,634 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 04:06:14,738 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9750, loss[loss=0.09275, beats_loss=0.01088, ecapa_loss=0.0001853, whisper_loss=0.08002, over 17447.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01079, ecapa_loss=0.0001679, whisper_loss=0.09163, over 3870791.30 frames. ], batch size: 71, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:06:20,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1981490.0, ans=0.125 2024-08-13 04:06:25,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1981490.0, ans=0.05 2024-08-13 04:06:31,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1981590.0, ans=0.0 2024-08-13 04:06:42,115 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 04:06:52,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1981690.0, ans=0.125 2024-08-13 04:07:12,521 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.30 vs. limit=22.5 2024-08-13 04:07:13,296 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 04:07:24,302 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9800, loss[loss=0.1046, beats_loss=0.01218, ecapa_loss=0.0001426, whisper_loss=0.09098, over 22832.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0109, ecapa_loss=0.0001671, whisper_loss=0.09088, over 3878583.86 frames. ], batch size: 91, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:07:39,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1982090.0, ans=0.0 2024-08-13 04:07:42,957 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 25 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-13 04:07:49,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1982090.0, ans=0.125 2024-08-13 04:07:57,192 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 04:08:04,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1982190.0, ans=0.125 2024-08-13 04:08:08,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1982290.0, ans=0.125 2024-08-13 04:08:15,261 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.387e+01 2.562e+01 2.934e+01 4.315e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-13 04:08:22,558 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 04:08:26,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1982390.0, ans=0.1 2024-08-13 04:08:34,308 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9850, loss[loss=0.1052, beats_loss=0.01145, ecapa_loss=0.0001599, whisper_loss=0.09217, over 17127.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01084, ecapa_loss=0.0001678, whisper_loss=0.09219, over 3899311.00 frames. ], batch size: 68, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:08:40,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1982490.0, ans=0.1 2024-08-13 04:08:47,254 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 04:08:52,043 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.35 vs. limit=15.0 2024-08-13 04:08:59,946 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-08-13 04:09:09,577 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 04:09:14,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1982690.0, ans=0.125 2024-08-13 04:09:18,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1982790.0, ans=0.0 2024-08-13 04:09:25,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1982790.0, ans=0.2 2024-08-13 04:09:28,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1982790.0, ans=0.1 2024-08-13 04:09:28,467 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-08-13 04:09:40,013 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 04:09:44,014 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9900, loss[loss=0.1219, beats_loss=0.009342, ecapa_loss=0.0001639, whisper_loss=0.1109, over 17304.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01091, ecapa_loss=0.0001681, whisper_loss=0.09195, over 3885958.52 frames. ], batch size: 65, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:09:44,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1982990.0, ans=0.0 2024-08-13 04:09:55,613 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 27 from LS+wenet, 18 from Vox, 14 fro AS 2024-08-13 04:09:59,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1983090.0, ans=0.125 2024-08-13 04:10:01,469 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.87 vs. limit=6.0 2024-08-13 04:10:02,293 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 04:10:10,738 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-13 04:10:10,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1983190.0, ans=0.125 2024-08-13 04:10:13,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1983190.0, ans=0.125 2024-08-13 04:10:17,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1983190.0, ans=0.0 2024-08-13 04:10:24,616 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 25 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-13 04:10:24,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1983290.0, ans=0.125 2024-08-13 04:10:34,487 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.489e+01 2.832e+01 3.268e+01 9.650e+01, threshold=5.664e+01, percent-clipped=3.0 2024-08-13 04:10:42,386 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 04:10:53,006 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 9950, loss[loss=0.0998, beats_loss=0.01378, ecapa_loss=0.0001309, whisper_loss=0.08471, over 18134.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01092, ecapa_loss=0.0001671, whisper_loss=0.0924, over 3883028.98 frames. ], batch size: 70, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:11:06,966 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-13 04:11:35,084 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.28 vs. limit=15.0 2024-08-13 04:11:44,225 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 04:12:01,994 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10000, loss[loss=0.0948, beats_loss=0.0139, ecapa_loss=0.0001388, whisper_loss=0.07951, over 18652.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01084, ecapa_loss=0.0001678, whisper_loss=0.09239, over 3853243.53 frames. ], batch size: 75, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:12:02,154 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 04:12:02,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1983990.0, ans=0.125 2024-08-13 04:12:04,816 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=15.0 2024-08-13 04:12:20,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1984090.0, ans=0.125 2024-08-13 04:12:42,212 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-13 04:12:42,807 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.04 vs. limit=22.5 2024-08-13 04:12:45,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1984290.0, ans=0.1 2024-08-13 04:12:52,309 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.356e+01 2.631e+01 2.870e+01 5.046e+01, threshold=5.261e+01, percent-clipped=0.0 2024-08-13 04:13:06,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1984390.0, ans=0.125 2024-08-13 04:13:08,359 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.35 vs. limit=15.0 2024-08-13 04:13:11,564 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10050, loss[loss=0.09695, beats_loss=0.01239, ecapa_loss=0.0001509, whisper_loss=0.08305, over 20694.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0109, ecapa_loss=0.000168, whisper_loss=0.09155, over 3870357.87 frames. ], batch size: 80, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:13:12,535 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2024-08-13 04:13:27,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1984590.0, ans=0.5 2024-08-13 04:13:43,977 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.26 vs. limit=6.0 2024-08-13 04:13:50,611 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 04:13:51,517 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-13 04:13:52,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1984790.0, ans=0.95 2024-08-13 04:13:59,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1984790.0, ans=0.125 2024-08-13 04:13:59,322 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-08-13 04:14:00,032 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 04:14:03,558 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.43 vs. limit=22.5 2024-08-13 04:14:05,587 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-13 04:14:05,858 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 04:14:09,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1984890.0, ans=0.035 2024-08-13 04:14:20,847 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10100, loss[loss=0.0988, beats_loss=0.01018, ecapa_loss=0.0001542, whisper_loss=0.08707, over 15212.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01088, ecapa_loss=0.0001709, whisper_loss=0.09136, over 3871311.89 frames. ], batch size: 57, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:14:28,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1984990.0, ans=0.1 2024-08-13 04:14:34,751 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 04:14:41,775 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 24 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-13 04:14:44,284 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 04:14:58,198 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 29 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 04:14:59,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1985190.0, ans=0.125 2024-08-13 04:15:09,674 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.57 vs. limit=15.0 2024-08-13 04:15:10,115 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.445e+01 2.629e+01 3.089e+01 3.463e+02, threshold=5.257e+01, percent-clipped=1.0 2024-08-13 04:15:29,716 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10150, loss[loss=0.1048, beats_loss=0.01291, ecapa_loss=0.0001502, whisper_loss=0.09038, over 18620.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01082, ecapa_loss=0.0001718, whisper_loss=0.09148, over 3883777.23 frames. ], batch size: 75, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:15:52,497 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-13 04:15:53,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1985590.0, ans=0.1 2024-08-13 04:16:05,602 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 04:16:11,270 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 13 from LS+wenet, 28 from Vox, 20 fro AS 2024-08-13 04:16:15,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1985790.0, ans=0.1 2024-08-13 04:16:23,540 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 04:16:35,688 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-13 04:16:38,156 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10200, loss[loss=0.1271, beats_loss=0.009437, ecapa_loss=0.0001861, whisper_loss=0.1158, over 14228.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01076, ecapa_loss=0.0001725, whisper_loss=0.09173, over 3891493.52 frames. ], batch size: 55, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:16:58,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1986090.0, ans=0.0 2024-08-13 04:17:06,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1986190.0, ans=0.125 2024-08-13 04:17:09,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1986190.0, ans=0.0 2024-08-13 04:17:27,258 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.34 vs. limit=22.5 2024-08-13 04:17:27,733 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.440e+01 2.685e+01 3.230e+01 3.990e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-13 04:17:34,949 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-13 04:17:39,337 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 04:17:40,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1986390.0, ans=0.125 2024-08-13 04:17:46,996 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10250, loss[loss=0.1237, beats_loss=0.008421, ecapa_loss=0.0001916, whisper_loss=0.1134, over 17783.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01073, ecapa_loss=0.0001719, whisper_loss=0.09198, over 3880072.10 frames. ], batch size: 69, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:17:59,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1986590.0, ans=0.125 2024-08-13 04:18:01,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1986590.0, ans=0.125 2024-08-13 04:18:07,492 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-13 04:18:27,782 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 29 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 04:18:40,265 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 04:18:46,702 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-13 04:18:55,405 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10300, loss[loss=0.1065, beats_loss=0.01003, ecapa_loss=0.0002498, whisper_loss=0.09394, over 14669.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01078, ecapa_loss=0.0001728, whisper_loss=0.09202, over 3885137.63 frames. ], batch size: 62, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:18:58,334 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-13 04:18:59,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1986990.0, ans=0.04949747468305833 2024-08-13 04:19:02,400 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 21 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-13 04:19:05,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1986990.0, ans=0.125 2024-08-13 04:19:08,018 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-13 04:19:18,040 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.80 vs. limit=12.0 2024-08-13 04:19:21,765 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.604e+01 2024-08-13 04:19:44,132 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.484e+01 2.743e+01 3.118e+01 4.422e+01, threshold=5.485e+01, percent-clipped=0.0 2024-08-13 04:19:44,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1987290.0, ans=0.0 2024-08-13 04:19:44,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1987290.0, ans=0.125 2024-08-13 04:20:03,134 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10350, loss[loss=0.1147, beats_loss=0.01219, ecapa_loss=0.0001521, whisper_loss=0.101, over 22733.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01089, ecapa_loss=0.0001717, whisper_loss=0.09141, over 3905954.32 frames. ], batch size: 89, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:20:06,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1987490.0, ans=0.125 2024-08-13 04:20:22,500 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 04:20:30,567 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-13 04:20:32,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1987690.0, ans=0.125 2024-08-13 04:20:43,148 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 04:20:55,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1987790.0, ans=0.125 2024-08-13 04:21:10,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1987890.0, ans=0.1 2024-08-13 04:21:11,996 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10400, loss[loss=0.1165, beats_loss=0.00825, ecapa_loss=0.0001531, whisper_loss=0.1067, over 22631.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01083, ecapa_loss=0.0001706, whisper_loss=0.09226, over 3909974.47 frames. ], batch size: 90, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:21:29,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1988090.0, ans=0.125 2024-08-13 04:21:32,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1988090.0, ans=0.0 2024-08-13 04:21:37,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1988090.0, ans=0.125 2024-08-13 04:21:56,347 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-13 04:22:01,933 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.435e+01 2.770e+01 3.094e+01 5.065e+01, threshold=5.541e+01, percent-clipped=0.0 2024-08-13 04:22:13,567 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-08-13 04:22:21,437 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10450, loss[loss=0.1003, beats_loss=0.01368, ecapa_loss=0.000175, whisper_loss=0.08487, over 20696.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01082, ecapa_loss=0.0001706, whisper_loss=0.09248, over 3888536.73 frames. ], batch size: 87, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:22:25,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1988490.0, ans=0.025 2024-08-13 04:22:51,031 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.85 vs. limit=15.0 2024-08-13 04:23:17,964 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 04:23:30,141 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10500, loss[loss=0.09889, beats_loss=0.0132, ecapa_loss=0.0001478, whisper_loss=0.08421, over 20077.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01084, ecapa_loss=0.00017, whisper_loss=0.0917, over 3870249.95 frames. ], batch size: 82, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:23:44,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1989090.0, ans=0.125 2024-08-13 04:23:45,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1989090.0, ans=0.2 2024-08-13 04:23:46,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1989090.0, ans=0.0 2024-08-13 04:23:50,223 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.21 vs. limit=15.0 2024-08-13 04:23:58,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1989190.0, ans=0.2 2024-08-13 04:23:58,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1989190.0, ans=0.0 2024-08-13 04:24:21,491 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.377e+01 2.646e+01 2.972e+01 5.578e+01, threshold=5.291e+01, percent-clipped=1.0 2024-08-13 04:24:36,811 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 04:24:43,045 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10550, loss[loss=0.1154, beats_loss=0.0106, ecapa_loss=0.0002144, whisper_loss=0.1027, over 21479.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01077, ecapa_loss=0.0001703, whisper_loss=0.09205, over 3861735.22 frames. ], batch size: 91, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:25:37,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1989790.0, ans=0.125 2024-08-13 04:25:43,922 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 04:25:54,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1989890.0, ans=0.125 2024-08-13 04:26:00,241 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10600, loss[loss=0.09359, beats_loss=0.01175, ecapa_loss=0.0001392, whisper_loss=0.08045, over 22162.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0108, ecapa_loss=0.0001706, whisper_loss=0.09149, over 3871353.02 frames. ], batch size: 89, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:26:05,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1989990.0, ans=0.0 2024-08-13 04:26:11,565 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.94 vs. limit=15.0 2024-08-13 04:26:15,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1990090.0, ans=0.025 2024-08-13 04:26:27,414 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 04:26:30,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1990190.0, ans=0.125 2024-08-13 04:26:37,250 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.132e-02 2024-08-13 04:26:39,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1990190.0, ans=0.04949747468305833 2024-08-13 04:26:54,444 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.291e+01 2.645e+01 2.934e+01 5.325e+01, threshold=5.289e+01, percent-clipped=1.0 2024-08-13 04:27:15,539 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10650, loss[loss=0.1151, beats_loss=0.009967, ecapa_loss=0.0001882, whisper_loss=0.1032, over 21915.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01077, ecapa_loss=0.0001704, whisper_loss=0.09202, over 3885944.86 frames. ], batch size: 89, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:27:18,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2024-08-13 04:27:21,377 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.20 vs. limit=10.0 2024-08-13 04:27:24,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1990490.0, ans=0.2 2024-08-13 04:27:27,073 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-13 04:27:49,485 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 04:27:54,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1990690.0, ans=0.2 2024-08-13 04:28:16,689 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2024-08-13 04:28:24,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1990890.0, ans=0.1 2024-08-13 04:28:27,145 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 04:28:35,229 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10700, loss[loss=0.0821, beats_loss=0.01306, ecapa_loss=0.0001496, whisper_loss=0.06754, over 14827.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01075, ecapa_loss=0.0001696, whisper_loss=0.09271, over 3872405.88 frames. ], batch size: 60, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:28:45,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1990990.0, ans=0.2 2024-08-13 04:28:51,695 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=12.0 2024-08-13 04:29:14,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1991190.0, ans=0.07 2024-08-13 04:29:15,537 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-13 04:29:24,468 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 12 from Vox, 50 fro AS 2024-08-13 04:29:30,309 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.433e+01 2.666e+01 3.252e+01 5.472e+01, threshold=5.332e+01, percent-clipped=1.0 2024-08-13 04:29:37,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1991390.0, ans=0.125 2024-08-13 04:29:37,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1991390.0, ans=0.2 2024-08-13 04:29:48,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1991390.0, ans=0.125 2024-08-13 04:29:52,719 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10750, loss[loss=0.1096, beats_loss=0.01055, ecapa_loss=0.0001405, whisper_loss=0.09761, over 18605.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01078, ecapa_loss=0.0001679, whisper_loss=0.09291, over 3873220.57 frames. ], batch size: 68, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:30:25,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1991690.0, ans=0.125 2024-08-13 04:30:26,150 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-13 04:30:37,915 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 04:30:39,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1991790.0, ans=0.125 2024-08-13 04:30:39,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1991790.0, ans=0.2 2024-08-13 04:31:02,656 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 04:31:09,401 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.31 vs. limit=22.5 2024-08-13 04:31:10,132 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 04:31:12,413 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 04:31:13,579 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10800, loss[loss=0.1152, beats_loss=0.01161, ecapa_loss=0.0001523, whisper_loss=0.102, over 22208.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01083, ecapa_loss=0.0001672, whisper_loss=0.0933, over 3911641.65 frames. ], batch size: 90, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:31:27,010 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.38 vs. limit=22.5 2024-08-13 04:31:42,331 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.08 vs. limit=6.0 2024-08-13 04:31:43,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1992090.0, ans=0.125 2024-08-13 04:31:52,763 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 04:32:05,124 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 04:32:08,337 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-13 04:32:10,798 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.408e+01 2.896e+01 3.475e+01 4.951e+01, threshold=5.792e+01, percent-clipped=0.0 2024-08-13 04:32:32,914 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10850, loss[loss=0.1081, beats_loss=0.01141, ecapa_loss=0.0001412, whisper_loss=0.0953, over 20013.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01082, ecapa_loss=0.000168, whisper_loss=0.09345, over 3916446.28 frames. ], batch size: 75, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:32:41,828 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-13 04:33:22,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1992790.0, ans=0.125 2024-08-13 04:33:23,973 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 04:33:33,539 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-08-13 04:33:34,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1992890.0, ans=0.125 2024-08-13 04:33:39,219 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 04:33:51,682 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10900, loss[loss=0.09687, beats_loss=0.01185, ecapa_loss=0.0001824, whisper_loss=0.08319, over 17792.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01082, ecapa_loss=0.0001674, whisper_loss=0.09328, over 3941232.17 frames. ], batch size: 73, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:34:22,847 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 04:34:41,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1993290.0, ans=0.125 2024-08-13 04:34:47,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1993290.0, ans=0.0 2024-08-13 04:34:49,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1993290.0, ans=0.0 2024-08-13 04:34:51,769 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.538e+01 2.794e+01 3.172e+01 4.370e+01, threshold=5.589e+01, percent-clipped=0.0 2024-08-13 04:34:51,908 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-13 04:35:03,534 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-13 04:35:12,620 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 10950, loss[loss=0.08324, beats_loss=0.01418, ecapa_loss=0.0001437, whisper_loss=0.06762, over 22210.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01092, ecapa_loss=0.0001675, whisper_loss=0.09261, over 3974188.68 frames. ], batch size: 92, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:35:26,250 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2024-08-13 04:35:36,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=1993590.0, ans=0.2 2024-08-13 04:35:59,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1993790.0, ans=0.125 2024-08-13 04:36:09,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1993790.0, ans=0.04949747468305833 2024-08-13 04:36:18,419 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-13 04:36:33,422 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11000, loss[loss=0.08582, beats_loss=0.009949, ecapa_loss=0.0002607, whisper_loss=0.07327, over 12942.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01081, ecapa_loss=0.000169, whisper_loss=0.09302, over 3971111.65 frames. ], batch size: 55, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:36:35,022 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 04:36:38,681 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 04:36:39,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1993990.0, ans=0.0 2024-08-13 04:36:40,063 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 04:36:46,378 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 04:37:18,994 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.31 vs. limit=6.0 2024-08-13 04:37:31,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1994290.0, ans=0.0 2024-08-13 04:37:33,617 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.385e+01 2.603e+01 2.980e+01 9.171e+01, threshold=5.207e+01, percent-clipped=2.0 2024-08-13 04:37:43,163 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-13 04:37:54,192 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11050, loss[loss=0.07712, beats_loss=0.01224, ecapa_loss=0.0001451, whisper_loss=0.06343, over 16947.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01081, ecapa_loss=0.0001689, whisper_loss=0.09279, over 3973353.09 frames. ], batch size: 68, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:37:57,512 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 04:38:06,549 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 04:38:20,781 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2024-08-13 04:38:40,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1994790.0, ans=0.2 2024-08-13 04:38:59,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1994890.0, ans=0.125 2024-08-13 04:39:18,241 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11100, loss[loss=0.1018, beats_loss=0.0127, ecapa_loss=0.0001462, whisper_loss=0.08759, over 22924.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01078, ecapa_loss=0.0001687, whisper_loss=0.09283, over 3978604.00 frames. ], batch size: 93, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:39:25,514 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 04:39:43,759 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2024-08-13 04:40:18,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1995290.0, ans=0.125 2024-08-13 04:40:23,373 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.346e+01 2.633e+01 2.953e+01 4.555e+01, threshold=5.265e+01, percent-clipped=0.0 2024-08-13 04:40:27,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1995290.0, ans=0.0 2024-08-13 04:40:52,082 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11150, loss[loss=0.09731, beats_loss=0.01033, ecapa_loss=0.0001638, whisper_loss=0.08534, over 13550.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01074, ecapa_loss=0.0001689, whisper_loss=0.09299, over 3943149.30 frames. ], batch size: 54, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:41:02,094 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 04:41:24,951 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2024-08-13 04:41:26,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1995590.0, ans=0.09899494936611666 2024-08-13 04:41:40,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1995690.0, ans=0.0 2024-08-13 04:41:51,991 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 13 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 04:41:54,594 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 04:42:01,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1995790.0, ans=0.0 2024-08-13 04:42:15,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1995790.0, ans=10.0 2024-08-13 04:42:15,958 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=15.0 2024-08-13 04:42:16,083 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.40 vs. limit=22.5 2024-08-13 04:42:16,593 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-13 04:42:31,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1995890.0, ans=0.125 2024-08-13 04:42:34,210 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.74 vs. limit=12.0 2024-08-13 04:42:37,736 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 16 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 04:42:40,698 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11200, loss[loss=0.08031, beats_loss=0.01135, ecapa_loss=0.0001974, whisper_loss=0.06698, over 14194.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01072, ecapa_loss=0.0001686, whisper_loss=0.09305, over 3933352.31 frames. ], batch size: 61, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:42:48,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1995990.0, ans=0.125 2024-08-13 04:42:49,192 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 04:43:48,734 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 04:43:57,308 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-13 04:44:12,449 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.527e+01 2.790e+01 3.048e+01 4.600e+01, threshold=5.581e+01, percent-clipped=0.0 2024-08-13 04:44:47,873 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11250, loss[loss=0.09617, beats_loss=0.01264, ecapa_loss=0.0001524, whisper_loss=0.082, over 16850.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01074, ecapa_loss=0.0001693, whisper_loss=0.0921, over 3907942.55 frames. ], batch size: 68, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:44:48,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1996490.0, ans=0.125 2024-08-13 04:45:23,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1996590.0, ans=0.2 2024-08-13 04:45:26,739 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.27 vs. limit=22.5 2024-08-13 04:45:37,008 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2024-08-13 04:45:38,763 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 30 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 04:45:47,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1996690.0, ans=0.2 2024-08-13 04:46:01,334 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-13 04:46:17,940 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 04:46:25,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1996890.0, ans=0.125 2024-08-13 04:46:27,975 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 04:46:41,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1996890.0, ans=0.0 2024-08-13 04:46:52,348 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11300, loss[loss=0.1072, beats_loss=0.01006, ecapa_loss=0.0001618, whisper_loss=0.09549, over 23575.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01075, ecapa_loss=0.0001683, whisper_loss=0.09166, over 3892614.37 frames. ], batch size: 91, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:47:08,490 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 04:47:13,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1996990.0, ans=0.125 2024-08-13 04:47:57,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1997190.0, ans=0.0 2024-08-13 04:48:27,581 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.451e+01 2.765e+01 3.179e+01 5.185e+01, threshold=5.530e+01, percent-clipped=0.0 2024-08-13 04:48:42,639 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-08-13 04:48:55,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1997390.0, ans=0.1 2024-08-13 04:48:59,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1997490.0, ans=0.125 2024-08-13 04:48:59,657 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.59 vs. limit=10.0 2024-08-13 04:49:00,008 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11350, loss[loss=0.1077, beats_loss=0.01002, ecapa_loss=0.0002289, whisper_loss=0.09544, over 21815.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01064, ecapa_loss=0.0001698, whisper_loss=0.09245, over 3901715.67 frames. ], batch size: 94, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:49:18,171 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 04:49:18,836 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.68 vs. limit=15.0 2024-08-13 04:49:26,115 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-13 04:49:32,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1997590.0, ans=0.125 2024-08-13 04:49:36,588 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-13 04:49:50,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1997690.0, ans=0.0 2024-08-13 04:50:14,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1997890.0, ans=0.2 2024-08-13 04:50:29,809 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11400, loss[loss=0.1099, beats_loss=0.01275, ecapa_loss=0.0001417, whisper_loss=0.09576, over 21893.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01075, ecapa_loss=0.0001689, whisper_loss=0.09168, over 3907007.30 frames. ], batch size: 87, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:50:33,274 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 04:50:49,106 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.32 vs. limit=12.0 2024-08-13 04:50:50,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1998090.0, ans=0.2 2024-08-13 04:51:00,346 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.155e+01 2024-08-13 04:51:13,218 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 17 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 04:51:30,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1998290.0, ans=0.0 2024-08-13 04:51:39,732 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.469e+01 2.790e+01 3.072e+01 4.491e+01, threshold=5.580e+01, percent-clipped=0.0 2024-08-13 04:51:40,608 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.73 vs. limit=10.0 2024-08-13 04:52:03,960 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11450, loss[loss=0.11, beats_loss=0.01071, ecapa_loss=0.0001547, whisper_loss=0.09776, over 23364.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01083, ecapa_loss=0.0001693, whisper_loss=0.09159, over 3931157.23 frames. ], batch size: 93, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:52:04,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1998490.0, ans=0.125 2024-08-13 04:52:10,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1998490.0, ans=0.0 2024-08-13 04:52:15,073 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-13 04:52:21,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1998590.0, ans=0.125 2024-08-13 04:52:42,803 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 04:52:47,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1998690.0, ans=0.125 2024-08-13 04:53:14,252 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 04:53:26,221 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-13 04:53:38,118 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11500, loss[loss=0.1114, beats_loss=0.01108, ecapa_loss=0.0001495, whisper_loss=0.09887, over 20087.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01074, ecapa_loss=0.00017, whisper_loss=0.09291, over 3916601.83 frames. ], batch size: 79, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:53:39,027 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 04:53:44,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1998990.0, ans=0.025 2024-08-13 04:53:51,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1998990.0, ans=0.07 2024-08-13 04:54:19,934 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2024-08-13 04:54:27,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.02 vs. limit=10.0 2024-08-13 04:54:34,938 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 04:54:43,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1999290.0, ans=0.1 2024-08-13 04:54:45,330 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=15.0 2024-08-13 04:54:45,761 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.530e+01 2.837e+01 3.156e+01 6.576e+01, threshold=5.675e+01, percent-clipped=1.0 2024-08-13 04:55:02,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1999390.0, ans=0.0 2024-08-13 04:55:03,449 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-13 04:55:07,899 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11550, loss[loss=0.09048, beats_loss=0.01154, ecapa_loss=0.0001735, whisper_loss=0.0772, over 21009.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01068, ecapa_loss=0.0001684, whisper_loss=0.09312, over 3895760.49 frames. ], batch size: 89, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:55:11,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1999490.0, ans=0.125 2024-08-13 04:55:12,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1999490.0, ans=0.2 2024-08-13 04:55:12,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1999490.0, ans=0.125 2024-08-13 04:55:14,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1999490.0, ans=0.1 2024-08-13 04:55:18,734 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.956e-01 2024-08-13 04:55:23,090 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 20 from LS+wenet, 31 from Vox, 41 fro AS 2024-08-13 04:55:26,179 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 04:55:27,568 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=15.0 2024-08-13 04:56:11,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1999790.0, ans=0.125 2024-08-13 04:56:40,271 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11600, loss[loss=0.1218, beats_loss=0.009043, ecapa_loss=0.0002092, whisper_loss=0.1107, over 19251.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01068, ecapa_loss=0.0001679, whisper_loss=0.09267, over 3906903.34 frames. ], batch size: 77, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:56:49,601 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 04:56:56,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1999990.0, ans=0.0 2024-08-13 04:57:12,973 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.38 vs. limit=15.0 2024-08-13 04:57:14,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2000090.0, ans=0.0 2024-08-13 04:57:56,300 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.423e+01 2.636e+01 2.832e+01 7.836e+01, threshold=5.272e+01, percent-clipped=1.0 2024-08-13 04:57:57,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2000290.0, ans=0.125 2024-08-13 04:58:00,233 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.69 vs. limit=15.0 2024-08-13 04:58:22,168 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11650, loss[loss=0.1216, beats_loss=0.008894, ecapa_loss=0.0001475, whisper_loss=0.1112, over 18182.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01072, ecapa_loss=0.0001678, whisper_loss=0.09257, over 3941983.36 frames. ], batch size: 67, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:58:26,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2000490.0, ans=0.0 2024-08-13 04:58:43,828 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 04:59:00,326 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 04:59:06,887 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-13 04:59:12,628 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 04:59:26,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2000790.0, ans=0.125 2024-08-13 04:59:32,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2000790.0, ans=0.125 2024-08-13 04:59:35,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2000790.0, ans=0.0 2024-08-13 04:59:40,091 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 04:59:50,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2000890.0, ans=0.2 2024-08-13 04:59:56,916 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11700, loss[loss=0.122, beats_loss=0.01093, ecapa_loss=0.0001271, whisper_loss=0.1098, over 23536.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01074, ecapa_loss=0.0001686, whisper_loss=0.09254, over 3944308.82 frames. ], batch size: 87, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:00:22,611 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 05:00:36,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2001190.0, ans=0.04949747468305833 2024-08-13 05:00:44,549 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.34 vs. limit=15.0 2024-08-13 05:00:46,939 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 05:00:57,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2001290.0, ans=0.1 2024-08-13 05:01:00,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2001290.0, ans=0.0 2024-08-13 05:01:06,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2001290.0, ans=0.125 2024-08-13 05:01:07,047 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.358e+01 2.707e+01 3.132e+01 5.516e+01, threshold=5.414e+01, percent-clipped=1.0 2024-08-13 05:01:07,189 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 05:01:13,911 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 05:01:15,026 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-13 05:01:30,498 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11750, loss[loss=0.1167, beats_loss=0.008166, ecapa_loss=0.0001589, whisper_loss=0.107, over 22761.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01086, ecapa_loss=0.0001686, whisper_loss=0.09199, over 3953740.49 frames. ], batch size: 84, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:01:33,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2001490.0, ans=0.125 2024-08-13 05:01:45,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=2001490.0, ans=10.0 2024-08-13 05:01:52,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2001590.0, ans=0.125 2024-08-13 05:02:01,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2001590.0, ans=0.125 2024-08-13 05:02:09,819 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 05:02:31,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2001790.0, ans=0.125 2024-08-13 05:02:32,752 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 05:02:36,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2001790.0, ans=0.2 2024-08-13 05:02:42,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2001790.0, ans=0.125 2024-08-13 05:02:44,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2001890.0, ans=0.0 2024-08-13 05:02:49,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2001890.0, ans=0.0 2024-08-13 05:03:03,026 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11800, loss[loss=0.1008, beats_loss=0.01075, ecapa_loss=0.0001666, whisper_loss=0.08843, over 15172.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01091, ecapa_loss=0.0001683, whisper_loss=0.09169, over 3956188.85 frames. ], batch size: 60, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:03:07,977 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 15 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 05:03:09,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2001990.0, ans=0.125 2024-08-13 05:03:27,110 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 05:03:45,779 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.19 vs. limit=15.0 2024-08-13 05:03:50,776 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-13 05:03:58,237 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.00 vs. limit=6.0 2024-08-13 05:04:01,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2002290.0, ans=0.125 2024-08-13 05:04:01,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2002290.0, ans=0.1 2024-08-13 05:04:06,125 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.110e+01 2.539e+01 2.830e+01 3.148e+01 9.366e+01, threshold=5.659e+01, percent-clipped=1.0 2024-08-13 05:04:08,454 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 27 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 05:04:18,233 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 05:04:29,024 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11850, loss[loss=0.08891, beats_loss=0.013, ecapa_loss=0.0001535, whisper_loss=0.07438, over 21144.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.000168, whisper_loss=0.09157, over 3932905.33 frames. ], batch size: 86, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:04:31,976 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 05:04:39,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2002490.0, ans=0.09899494936611666 2024-08-13 05:05:01,602 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-13 05:05:19,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=2002690.0, ans=15.0 2024-08-13 05:05:57,508 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11900, loss[loss=0.07409, beats_loss=0.01295, ecapa_loss=0.000139, whisper_loss=0.05975, over 19812.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01104, ecapa_loss=0.0001674, whisper_loss=0.09065, over 3926242.01 frames. ], batch size: 81, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:06:19,587 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 05:06:22,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2003090.0, ans=0.0 2024-08-13 05:06:25,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.91 vs. limit=22.5 2024-08-13 05:06:31,022 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 24 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-13 05:06:32,831 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 05:06:59,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2003290.0, ans=0.0 2024-08-13 05:07:00,967 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.520e+01 2.675e+01 3.005e+01 5.998e+01, threshold=5.349e+01, percent-clipped=1.0 2024-08-13 05:07:02,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2003290.0, ans=0.0 2024-08-13 05:07:07,112 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 05:07:21,030 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-13 05:07:23,697 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 11950, loss[loss=0.1107, beats_loss=0.009917, ecapa_loss=0.0001996, whisper_loss=0.09875, over 22457.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01097, ecapa_loss=0.000168, whisper_loss=0.09085, over 3895912.18 frames. ], batch size: 89, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:07:25,785 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 30 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-13 05:07:42,799 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.23 vs. limit=10.0 2024-08-13 05:07:45,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2003590.0, ans=0.0 2024-08-13 05:08:06,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2003690.0, ans=0.1 2024-08-13 05:08:10,145 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2024-08-13 05:08:16,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2003790.0, ans=0.2 2024-08-13 05:08:24,846 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-13 05:08:26,294 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.45 vs. limit=15.0 2024-08-13 05:08:49,874 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12000, loss[loss=0.1161, beats_loss=0.009768, ecapa_loss=0.000183, whisper_loss=0.1045, over 22401.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.011, ecapa_loss=0.0001668, whisper_loss=0.09087, over 3916728.28 frames. ], batch size: 87, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:08:49,875 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-13 05:09:29,176 INFO [train_multi_KD3.py:1149] (2/4) Epoch 14, validation on ASR_libri: loss=0.2526, beats_loss=0, ecapa_loss=0.0005731, whisper_loss=0.2468, over 922467.00 frames. 2024-08-13 05:09:48,302 INFO [train_multi_KD3.py:1149] (2/4) Epoch 14, validation on SV_voxceleb1: loss=0.004602, beats_loss=0, ecapa_loss=0.0004602, whisper_loss=0, over 939242.00 frames. 2024-08-13 05:11:41,032 INFO [train_multi_KD3.py:1149] (2/4) Epoch 14, validation on AT_audioset: loss=0.0239, beats_loss=0.0239, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 05:11:41,035 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-13 05:11:44,840 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.31 vs. limit=22.5 2024-08-13 05:11:58,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2004090.0, ans=0.2 2024-08-13 05:12:09,758 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.44 vs. limit=15.0 2024-08-13 05:12:15,923 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 05:12:29,622 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 05:12:43,263 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.475e+01 2.665e+01 3.111e+01 1.048e+02, threshold=5.329e+01, percent-clipped=1.0 2024-08-13 05:12:50,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2004390.0, ans=0.0 2024-08-13 05:13:04,542 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12050, loss[loss=0.1083, beats_loss=0.01016, ecapa_loss=0.0001531, whisper_loss=0.09659, over 19714.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01095, ecapa_loss=0.0001677, whisper_loss=0.09115, over 3928930.75 frames. ], batch size: 75, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:13:05,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2004490.0, ans=0.0 2024-08-13 05:13:12,242 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-13 05:13:23,369 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.42 vs. limit=22.5 2024-08-13 05:13:26,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2004590.0, ans=0.125 2024-08-13 05:13:35,524 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-13 05:13:51,618 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 05:14:05,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2004790.0, ans=0.1 2024-08-13 05:14:13,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2004890.0, ans=0.125 2024-08-13 05:14:26,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2004890.0, ans=0.2 2024-08-13 05:14:28,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=2004990.0, ans=22.5 2024-08-13 05:14:28,960 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12100, loss[loss=0.1304, beats_loss=0.007911, ecapa_loss=0.0001949, whisper_loss=0.1206, over 22699.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01088, ecapa_loss=0.0001681, whisper_loss=0.09146, over 3914339.99 frames. ], batch size: 92, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:14:35,910 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 05:14:38,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2004990.0, ans=0.2 2024-08-13 05:14:45,282 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.42 vs. limit=15.0 2024-08-13 05:14:59,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2005090.0, ans=0.2 2024-08-13 05:15:01,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2005190.0, ans=0.07 2024-08-13 05:15:31,283 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.461e+01 2.696e+01 3.254e+01 5.243e+01, threshold=5.392e+01, percent-clipped=0.0 2024-08-13 05:15:34,145 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 05:15:52,070 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12150, loss[loss=0.0866, beats_loss=0.01093, ecapa_loss=0.0001674, whisper_loss=0.074, over 15575.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01085, ecapa_loss=0.0001686, whisper_loss=0.09141, over 3912383.54 frames. ], batch size: 62, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:15:57,984 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 05:15:59,307 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 05:15:59,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2005490.0, ans=0.2 2024-08-13 05:16:02,010 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2024-08-13 05:16:10,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2005590.0, ans=0.5 2024-08-13 05:16:19,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2005590.0, ans=0.125 2024-08-13 05:16:21,223 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-13 05:16:26,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2005690.0, ans=0.2 2024-08-13 05:16:48,278 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.69 vs. limit=22.5 2024-08-13 05:16:59,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2005890.0, ans=0.125 2024-08-13 05:17:07,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2005890.0, ans=0.0 2024-08-13 05:17:09,534 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2024-08-13 05:17:11,887 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 05:17:14,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2005890.0, ans=0.125 2024-08-13 05:17:17,275 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12200, loss[loss=0.1269, beats_loss=0.00874, ecapa_loss=0.0001517, whisper_loss=0.1166, over 18424.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01088, ecapa_loss=0.0001674, whisper_loss=0.09157, over 3907735.85 frames. ], batch size: 68, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:17:23,847 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.90 vs. limit=22.5 2024-08-13 05:17:25,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2005990.0, ans=0.0 2024-08-13 05:17:28,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2005990.0, ans=0.125 2024-08-13 05:17:56,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2006190.0, ans=0.0 2024-08-13 05:18:06,082 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-13 05:18:10,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2006290.0, ans=0.1 2024-08-13 05:18:17,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2006290.0, ans=0.1 2024-08-13 05:18:21,610 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.467e+01 2.824e+01 3.197e+01 4.821e+01, threshold=5.649e+01, percent-clipped=0.0 2024-08-13 05:18:32,253 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-13 05:18:42,539 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12250, loss[loss=0.104, beats_loss=0.0133, ecapa_loss=0.0001678, whisper_loss=0.08907, over 22417.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01083, ecapa_loss=0.0001673, whisper_loss=0.09165, over 3898413.77 frames. ], batch size: 91, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:18:46,137 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 24 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-13 05:18:47,821 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 30 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 05:18:55,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2006490.0, ans=0.0 2024-08-13 05:19:08,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2006590.0, ans=0.04949747468305833 2024-08-13 05:19:14,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2006690.0, ans=0.125 2024-08-13 05:19:18,870 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 05:19:46,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2006790.0, ans=0.1 2024-08-13 05:19:51,795 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2024-08-13 05:19:59,823 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-08-13 05:20:00,741 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-13 05:20:04,694 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12300, loss[loss=0.1043, beats_loss=0.01247, ecapa_loss=0.0001366, whisper_loss=0.09043, over 16109.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01072, ecapa_loss=0.0001686, whisper_loss=0.09248, over 3896013.29 frames. ], batch size: 60, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:20:08,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2006990.0, ans=0.0 2024-08-13 05:20:17,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2006990.0, ans=0.07 2024-08-13 05:20:34,357 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.15 vs. limit=22.5 2024-08-13 05:20:40,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2007190.0, ans=0.1 2024-08-13 05:20:42,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2007190.0, ans=0.125 2024-08-13 05:20:53,709 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2024-08-13 05:21:06,649 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.461e+01 2.771e+01 3.048e+01 4.529e+01, threshold=5.542e+01, percent-clipped=0.0 2024-08-13 05:21:16,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2007390.0, ans=0.1 2024-08-13 05:21:29,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2007490.0, ans=0.0 2024-08-13 05:21:30,658 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12350, loss[loss=0.1131, beats_loss=0.00897, ecapa_loss=0.0002028, whisper_loss=0.1021, over 21348.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01065, ecapa_loss=0.0001699, whisper_loss=0.09298, over 3873946.95 frames. ], batch size: 88, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:21:34,011 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.08 vs. limit=22.5 2024-08-13 05:21:44,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2007490.0, ans=0.0 2024-08-13 05:21:44,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2007490.0, ans=0.125 2024-08-13 05:21:45,151 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.12 vs. limit=15.0 2024-08-13 05:21:54,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2007590.0, ans=0.1 2024-08-13 05:22:12,493 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-13 05:22:37,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2007890.0, ans=0.09899494936611666 2024-08-13 05:22:42,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2007890.0, ans=0.125 2024-08-13 05:22:55,510 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12400, loss[loss=0.08699, beats_loss=0.01399, ecapa_loss=0.0001659, whisper_loss=0.07134, over 15292.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01068, ecapa_loss=0.0001705, whisper_loss=0.09291, over 3866229.86 frames. ], batch size: 63, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:23:12,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2008090.0, ans=0.0 2024-08-13 05:23:20,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2008090.0, ans=0.2 2024-08-13 05:23:30,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2008190.0, ans=0.0 2024-08-13 05:23:43,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2008190.0, ans=0.125 2024-08-13 05:23:59,578 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.499e+01 2.802e+01 3.094e+01 1.002e+02, threshold=5.604e+01, percent-clipped=2.0 2024-08-13 05:24:01,565 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 05:24:07,203 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 22 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-13 05:24:21,148 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 05:24:22,303 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12450, loss[loss=0.1002, beats_loss=0.009177, ecapa_loss=0.0001451, whisper_loss=0.08957, over 19792.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01069, ecapa_loss=0.0001709, whisper_loss=0.09265, over 3905761.18 frames. ], batch size: 76, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:24:29,102 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-13 05:24:29,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2008490.0, ans=0.125 2024-08-13 05:24:29,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2008490.0, ans=0.0 2024-08-13 05:24:33,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=2008490.0, ans=12.0 2024-08-13 05:24:39,198 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-13 05:24:42,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2008590.0, ans=0.125 2024-08-13 05:24:53,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2008590.0, ans=0.0 2024-08-13 05:25:11,347 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 05:25:41,168 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.76 vs. limit=10.0 2024-08-13 05:25:44,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.69 vs. limit=22.5 2024-08-13 05:25:50,526 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12500, loss[loss=0.09408, beats_loss=0.01129, ecapa_loss=0.0001557, whisper_loss=0.08123, over 18293.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01072, ecapa_loss=0.0001699, whisper_loss=0.09222, over 3878436.54 frames. ], batch size: 71, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:25:52,787 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.87 vs. limit=10.0 2024-08-13 05:25:53,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.41 vs. limit=22.5 2024-08-13 05:25:54,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2008990.0, ans=0.125 2024-08-13 05:25:58,398 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-13 05:26:07,892 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 05:26:10,409 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.61 vs. limit=15.0 2024-08-13 05:26:21,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2009190.0, ans=0.09899494936611666 2024-08-13 05:26:38,989 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-13 05:26:39,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2009290.0, ans=0.1 2024-08-13 05:26:42,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2009290.0, ans=0.125 2024-08-13 05:26:51,888 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.389e+01 2.676e+01 3.149e+01 9.586e+01, threshold=5.353e+01, percent-clipped=2.0 2024-08-13 05:26:57,436 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 05:27:01,300 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 16 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 05:27:03,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2009390.0, ans=0.0 2024-08-13 05:27:09,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2024-08-13 05:27:14,376 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12550, loss[loss=0.1048, beats_loss=0.01094, ecapa_loss=0.0001383, whisper_loss=0.09244, over 22520.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01081, ecapa_loss=0.0001691, whisper_loss=0.09176, over 3867002.60 frames. ], batch size: 89, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:27:18,260 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-13 05:27:19,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2009490.0, ans=0.125 2024-08-13 05:27:25,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2009490.0, ans=0.1 2024-08-13 05:27:30,311 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2024-08-13 05:27:32,640 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 35 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 05:27:57,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2009690.0, ans=0.125 2024-08-13 05:28:08,725 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.84 vs. limit=15.0 2024-08-13 05:28:20,702 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.61 vs. limit=15.0 2024-08-13 05:28:35,802 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12600, loss[loss=0.104, beats_loss=0.01192, ecapa_loss=0.0001796, whisper_loss=0.09026, over 15157.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01087, ecapa_loss=0.0001681, whisper_loss=0.09144, over 3849516.39 frames. ], batch size: 61, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:28:38,911 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-13 05:28:54,959 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 05:29:01,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2010090.0, ans=0.125 2024-08-13 05:29:06,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2010190.0, ans=0.125 2024-08-13 05:29:22,850 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 13 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 05:29:26,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2010290.0, ans=0.125 2024-08-13 05:29:27,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2010290.0, ans=0.1 2024-08-13 05:29:30,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2010290.0, ans=0.0 2024-08-13 05:29:36,024 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+01 2.351e+01 2.664e+01 2.979e+01 4.679e+01, threshold=5.327e+01, percent-clipped=0.0 2024-08-13 05:29:38,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2010290.0, ans=0.125 2024-08-13 05:29:46,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2010390.0, ans=0.125 2024-08-13 05:29:57,445 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12650, loss[loss=0.1095, beats_loss=0.011, ecapa_loss=0.000171, whisper_loss=0.09684, over 23700.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01084, ecapa_loss=0.0001691, whisper_loss=0.09153, over 3862617.82 frames. ], batch size: 95, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:30:03,470 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2024-08-13 05:30:11,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2010490.0, ans=6.0 2024-08-13 05:30:14,883 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 31 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-13 05:30:16,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2010590.0, ans=0.015 2024-08-13 05:30:27,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2010590.0, ans=0.125 2024-08-13 05:30:28,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2010590.0, ans=0.125 2024-08-13 05:30:30,826 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-13 05:30:31,679 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-13 05:30:32,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2010690.0, ans=0.1 2024-08-13 05:30:36,922 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.36 vs. limit=15.0 2024-08-13 05:30:55,939 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 16 from LS+wenet, 36 from Vox, 22 fro AS 2024-08-13 05:31:00,329 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 05:31:00,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2010790.0, ans=0.1 2024-08-13 05:31:21,807 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12700, loss[loss=0.1233, beats_loss=0.009246, ecapa_loss=0.0001812, whisper_loss=0.1122, over 16874.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01081, ecapa_loss=0.0001692, whisper_loss=0.09167, over 3874164.65 frames. ], batch size: 64, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:31:26,654 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 05:31:38,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=2011090.0, ans=10.0 2024-08-13 05:31:54,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2011190.0, ans=0.0 2024-08-13 05:32:11,049 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 05:32:21,912 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.465e+01 2.775e+01 3.008e+01 5.404e+01, threshold=5.550e+01, percent-clipped=1.0 2024-08-13 05:32:26,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2011390.0, ans=0.125 2024-08-13 05:32:42,964 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12750, loss[loss=0.1034, beats_loss=0.01185, ecapa_loss=0.0001257, whisper_loss=0.09034, over 23337.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0109, ecapa_loss=0.0001676, whisper_loss=0.09157, over 3883989.32 frames. ], batch size: 90, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:32:46,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2011490.0, ans=0.0 2024-08-13 05:32:54,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2011490.0, ans=0.5 2024-08-13 05:33:00,633 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 05:33:22,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2011690.0, ans=0.1 2024-08-13 05:33:42,736 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.48 vs. limit=15.0 2024-08-13 05:33:56,100 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2024-08-13 05:34:03,670 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12800, loss[loss=0.09165, beats_loss=0.009631, ecapa_loss=0.0001765, whisper_loss=0.08026, over 16737.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01093, ecapa_loss=0.0001684, whisper_loss=0.0913, over 3897338.50 frames. ], batch size: 64, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:34:18,594 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 05:34:21,853 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 18 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-13 05:34:23,453 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 05:34:23,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2012090.0, ans=0.125 2024-08-13 05:34:31,643 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-13 05:34:36,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2012190.0, ans=0.125 2024-08-13 05:34:46,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2012190.0, ans=0.125 2024-08-13 05:34:46,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2012190.0, ans=0.2 2024-08-13 05:34:55,080 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.02 vs. limit=12.0 2024-08-13 05:34:55,711 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-13 05:34:58,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2012290.0, ans=0.2 2024-08-13 05:35:05,403 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.426e+01 2.719e+01 3.089e+01 6.356e+01, threshold=5.438e+01, percent-clipped=2.0 2024-08-13 05:35:27,210 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12850, loss[loss=0.1161, beats_loss=0.01093, ecapa_loss=0.0001329, whisper_loss=0.1039, over 20412.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01101, ecapa_loss=0.0001673, whisper_loss=0.09034, over 3866414.61 frames. ], batch size: 78, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:36:07,455 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 05:36:07,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2012690.0, ans=0.1 2024-08-13 05:36:16,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2012790.0, ans=0.125 2024-08-13 05:36:17,338 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-13 05:36:38,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2012890.0, ans=0.125 2024-08-13 05:36:41,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2012890.0, ans=0.1 2024-08-13 05:36:47,367 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12900, loss[loss=0.1171, beats_loss=0.009891, ecapa_loss=0.0001697, whisper_loss=0.1055, over 21962.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01098, ecapa_loss=0.0001678, whisper_loss=0.09007, over 3874461.67 frames. ], batch size: 87, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:37:27,646 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.50 vs. limit=22.5 2024-08-13 05:37:30,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2013190.0, ans=0.1 2024-08-13 05:37:31,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2013190.0, ans=0.125 2024-08-13 05:37:44,830 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.357e+01 2.603e+01 2.918e+01 4.145e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-13 05:37:49,673 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 05:37:50,650 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.79 vs. limit=15.0 2024-08-13 05:38:07,033 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 12950, loss[loss=0.09646, beats_loss=0.01144, ecapa_loss=0.000129, whisper_loss=0.08373, over 21500.00 frames. ], tot_loss[loss=0.103, beats_loss=0.011, ecapa_loss=0.0001666, whisper_loss=0.09028, over 3890240.45 frames. ], batch size: 83, lr: 4.43e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:38:08,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2013490.0, ans=0.125 2024-08-13 05:38:17,440 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 05:38:24,140 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 05:38:34,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=2013590.0, ans=0.95 2024-08-13 05:38:46,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2013690.0, ans=0.125 2024-08-13 05:38:52,218 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 05:38:56,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2013790.0, ans=0.1 2024-08-13 05:38:57,673 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 05:39:01,000 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 22 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-13 05:39:03,966 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-13 05:39:04,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2013790.0, ans=0.2 2024-08-13 05:39:09,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2013790.0, ans=0.125 2024-08-13 05:39:23,497 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 05:39:30,055 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13000, loss[loss=0.1055, beats_loss=0.01134, ecapa_loss=0.0001597, whisper_loss=0.09252, over 21544.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01095, ecapa_loss=0.0001675, whisper_loss=0.09091, over 3878430.69 frames. ], batch size: 86, lr: 4.43e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:39:34,890 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.00 vs. limit=12.0 2024-08-13 05:39:50,627 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 05:40:01,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2014090.0, ans=0.125 2024-08-13 05:40:14,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2014190.0, ans=0.125 2024-08-13 05:40:28,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2014290.0, ans=0.125 2024-08-13 05:40:31,561 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.457e+01 2.798e+01 3.261e+01 6.703e+01, threshold=5.596e+01, percent-clipped=3.0 2024-08-13 05:40:44,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2014390.0, ans=0.07 2024-08-13 05:40:52,676 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13050, loss[loss=0.1154, beats_loss=0.008289, ecapa_loss=0.0001915, whisper_loss=0.1052, over 13182.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01084, ecapa_loss=0.0001676, whisper_loss=0.09178, over 3895996.17 frames. ], batch size: 54, lr: 4.43e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:41:08,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2014590.0, ans=0.0 2024-08-13 05:41:12,534 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.34 vs. limit=15.0 2024-08-13 05:41:19,698 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.43 vs. limit=15.0 2024-08-13 05:42:00,990 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 05:42:12,427 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13100, loss[loss=0.1249, beats_loss=0.008541, ecapa_loss=0.0001601, whisper_loss=0.1148, over 15779.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01092, ecapa_loss=0.0001678, whisper_loss=0.09094, over 3902252.61 frames. ], batch size: 59, lr: 4.43e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:42:32,206 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 05:42:53,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2015190.0, ans=0.1 2024-08-13 05:43:06,366 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 05:43:12,720 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.412e+01 2.747e+01 3.007e+01 5.883e+01, threshold=5.493e+01, percent-clipped=1.0 2024-08-13 05:43:18,555 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-13 05:43:32,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2015490.0, ans=0.0 2024-08-13 05:43:33,688 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13150, loss[loss=0.1244, beats_loss=0.008655, ecapa_loss=0.0001819, whisper_loss=0.1139, over 13542.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01088, ecapa_loss=0.0001667, whisper_loss=0.091, over 3897077.85 frames. ], batch size: 54, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:43:51,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2015590.0, ans=0.125 2024-08-13 05:44:33,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2015790.0, ans=0.1 2024-08-13 05:44:42,184 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 05:44:44,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2015890.0, ans=0.2 2024-08-13 05:44:53,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13200, loss[loss=0.06916, beats_loss=0.01367, ecapa_loss=0.0001495, whisper_loss=0.05399, over 21638.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01086, ecapa_loss=0.0001664, whisper_loss=0.09101, over 3878535.09 frames. ], batch size: 93, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:45:21,626 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-13 05:45:32,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2016190.0, ans=0.1 2024-08-13 05:45:39,472 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.71 vs. limit=15.0 2024-08-13 05:45:42,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2016290.0, ans=0.2 2024-08-13 05:45:43,442 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 05:45:48,749 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2024-08-13 05:45:52,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2016290.0, ans=15.0 2024-08-13 05:45:53,631 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.65 vs. limit=15.0 2024-08-13 05:45:53,966 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.423e+01 2.725e+01 2.981e+01 4.895e+01, threshold=5.450e+01, percent-clipped=0.0 2024-08-13 05:46:11,323 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.95 vs. limit=10.0 2024-08-13 05:46:14,911 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13250, loss[loss=0.11, beats_loss=0.01113, ecapa_loss=0.0001709, whisper_loss=0.0972, over 22836.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01081, ecapa_loss=0.0001671, whisper_loss=0.09219, over 3899676.19 frames. ], batch size: 91, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:46:22,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2024-08-13 05:46:33,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2016590.0, ans=0.125 2024-08-13 05:46:35,714 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.90 vs. limit=22.5 2024-08-13 05:46:41,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2016590.0, ans=0.0 2024-08-13 05:46:51,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2016690.0, ans=0.0 2024-08-13 05:47:12,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2016790.0, ans=0.125 2024-08-13 05:47:25,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2016890.0, ans=0.2 2024-08-13 05:47:41,107 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13300, loss[loss=0.1129, beats_loss=0.0112, ecapa_loss=0.0001495, whisper_loss=0.1002, over 22094.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01086, ecapa_loss=0.000166, whisper_loss=0.09246, over 3933829.21 frames. ], batch size: 87, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:47:41,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2016990.0, ans=0.1 2024-08-13 05:48:18,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2017190.0, ans=0.0 2024-08-13 05:48:23,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2017190.0, ans=0.2 2024-08-13 05:48:42,651 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.445e+01 2.718e+01 3.162e+01 4.686e+01, threshold=5.435e+01, percent-clipped=0.0 2024-08-13 05:48:52,871 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 05:48:58,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2017390.0, ans=0.125 2024-08-13 05:49:03,862 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13350, loss[loss=0.1122, beats_loss=0.01081, ecapa_loss=0.0001711, whisper_loss=0.09967, over 18084.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.0001658, whisper_loss=0.09178, over 3927874.74 frames. ], batch size: 72, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:49:06,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2017490.0, ans=0.0 2024-08-13 05:49:15,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2017490.0, ans=0.0 2024-08-13 05:49:18,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2017490.0, ans=0.0 2024-08-13 05:49:18,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2017490.0, ans=0.0 2024-08-13 05:49:23,047 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.813e+05 2024-08-13 05:49:29,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2017590.0, ans=0.125 2024-08-13 05:49:29,453 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-13 05:49:37,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2017690.0, ans=0.0 2024-08-13 05:49:45,372 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 05:49:47,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2017690.0, ans=0.125 2024-08-13 05:49:50,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2017690.0, ans=0.04949747468305833 2024-08-13 05:50:26,086 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13400, loss[loss=0.1039, beats_loss=0.009591, ecapa_loss=0.0001968, whisper_loss=0.09234, over 21133.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01091, ecapa_loss=0.0001652, whisper_loss=0.09171, over 3895233.31 frames. ], batch size: 91, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:50:28,213 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 05:50:36,441 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.77 vs. limit=22.5 2024-08-13 05:50:42,323 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.98 vs. limit=10.0 2024-08-13 05:51:00,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2018190.0, ans=0.0 2024-08-13 05:51:04,329 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2024-08-13 05:51:13,358 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-13 05:51:16,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2018290.0, ans=0.0 2024-08-13 05:51:19,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2018290.0, ans=0.0 2024-08-13 05:51:25,042 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.295e-03 2024-08-13 05:51:28,854 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.489e+01 2.760e+01 3.071e+01 5.716e+01, threshold=5.519e+01, percent-clipped=1.0 2024-08-13 05:51:29,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2018290.0, ans=0.0 2024-08-13 05:51:46,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2018390.0, ans=0.1 2024-08-13 05:51:50,203 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13450, loss[loss=0.08677, beats_loss=0.01226, ecapa_loss=0.0001335, whisper_loss=0.07317, over 16237.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01086, ecapa_loss=0.000167, whisper_loss=0.09189, over 3883695.95 frames. ], batch size: 63, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:52:11,282 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2024-08-13 05:52:21,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2018590.0, ans=15.0 2024-08-13 05:52:33,682 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.62 vs. limit=15.0 2024-08-13 05:52:39,436 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.896e-01 2024-08-13 05:52:51,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2018790.0, ans=0.125 2024-08-13 05:52:52,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2018790.0, ans=0.125 2024-08-13 05:53:00,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2018890.0, ans=0.125 2024-08-13 05:53:05,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2018890.0, ans=0.125 2024-08-13 05:53:06,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2018890.0, ans=0.125 2024-08-13 05:53:11,522 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 05:53:14,442 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13500, loss[loss=0.1099, beats_loss=0.009363, ecapa_loss=0.000203, whisper_loss=0.09851, over 21788.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01089, ecapa_loss=0.0001667, whisper_loss=0.09128, over 3882150.41 frames. ], batch size: 92, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:53:22,609 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-13 05:53:24,146 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-13 05:53:39,804 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 17 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 05:53:42,410 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.00 vs. limit=15.0 2024-08-13 05:53:50,535 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 05:54:17,718 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.519e+01 2.845e+01 3.228e+01 5.669e+01, threshold=5.689e+01, percent-clipped=1.0 2024-08-13 05:54:18,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2019290.0, ans=0.0 2024-08-13 05:54:39,043 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13550, loss[loss=0.1156, beats_loss=0.008963, ecapa_loss=0.0002162, whisper_loss=0.1045, over 16366.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0109, ecapa_loss=0.0001664, whisper_loss=0.09094, over 3831161.62 frames. ], batch size: 69, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:54:39,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2019490.0, ans=0.125 2024-08-13 05:54:42,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2019490.0, ans=0.2 2024-08-13 05:54:49,496 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 13 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 05:55:08,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2019590.0, ans=0.125 2024-08-13 05:55:11,135 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.90 vs. limit=22.5 2024-08-13 05:55:22,022 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 05:55:32,398 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-13 05:55:33,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2019790.0, ans=0.05 2024-08-13 05:56:02,569 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13600, loss[loss=0.108, beats_loss=0.01168, ecapa_loss=0.000166, whisper_loss=0.09467, over 14825.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01094, ecapa_loss=0.0001655, whisper_loss=0.09083, over 3833805.55 frames. ], batch size: 59, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:56:07,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2019990.0, ans=0.1 2024-08-13 05:56:10,485 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-13 05:56:14,605 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2024-08-13 05:56:22,284 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.45 vs. limit=8.0 2024-08-13 05:56:29,363 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-13 05:56:47,018 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 05:56:58,695 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2024-08-13 05:57:03,762 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.439e+01 2.789e+01 3.158e+01 4.809e+01, threshold=5.578e+01, percent-clipped=0.0 2024-08-13 05:57:07,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2020390.0, ans=0.0 2024-08-13 05:57:17,702 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2024-08-13 05:57:25,490 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13650, loss[loss=0.1102, beats_loss=0.008977, ecapa_loss=0.0002241, whisper_loss=0.09894, over 21317.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01098, ecapa_loss=0.0001658, whisper_loss=0.09019, over 3837046.89 frames. ], batch size: 88, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:57:29,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2020490.0, ans=0.125 2024-08-13 05:57:37,816 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.00 vs. limit=15.0 2024-08-13 05:57:43,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2020590.0, ans=0.125 2024-08-13 05:57:43,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2020590.0, ans=0.1 2024-08-13 05:57:46,667 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 05:57:47,975 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 05:58:03,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2020690.0, ans=0.1 2024-08-13 05:58:04,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2020690.0, ans=0.125 2024-08-13 05:58:22,902 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 21 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 05:58:25,324 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2024-08-13 05:58:45,111 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13700, loss[loss=0.1122, beats_loss=0.008373, ecapa_loss=0.0001819, whisper_loss=0.102, over 14510.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01105, ecapa_loss=0.0001655, whisper_loss=0.09053, over 3850678.60 frames. ], batch size: 55, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:58:50,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2020990.0, ans=0.125 2024-08-13 05:58:56,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2020990.0, ans=0.0 2024-08-13 05:59:03,805 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-13 05:59:15,343 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 05:59:30,857 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.721e-03 2024-08-13 05:59:34,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2021290.0, ans=0.2 2024-08-13 05:59:40,358 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.485e+01 2.717e+01 3.143e+01 5.833e+01, threshold=5.434e+01, percent-clipped=2.0 2024-08-13 05:59:55,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2021390.0, ans=0.0 2024-08-13 05:59:58,605 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13750, loss[loss=0.11, beats_loss=0.009178, ecapa_loss=0.0001743, whisper_loss=0.09907, over 14069.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01092, ecapa_loss=0.0001662, whisper_loss=0.09114, over 3848280.64 frames. ], batch size: 54, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:00:05,406 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 06:00:21,544 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2024-08-13 06:00:23,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2021590.0, ans=0.0 2024-08-13 06:00:45,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2021790.0, ans=0.2 2024-08-13 06:00:58,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2021890.0, ans=0.0 2024-08-13 06:01:03,532 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 06:01:07,289 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13800, loss[loss=0.09601, beats_loss=0.01038, ecapa_loss=0.0002037, whisper_loss=0.08359, over 13363.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01089, ecapa_loss=0.0001667, whisper_loss=0.09114, over 3829869.18 frames. ], batch size: 56, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:01:42,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2022190.0, ans=0.5 2024-08-13 06:01:43,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2022190.0, ans=0.05 2024-08-13 06:01:49,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2022290.0, ans=0.125 2024-08-13 06:01:50,320 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=15.0 2024-08-13 06:01:57,716 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.404e+01 2.696e+01 2.984e+01 4.554e+01, threshold=5.391e+01, percent-clipped=0.0 2024-08-13 06:02:07,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2022390.0, ans=0.125 2024-08-13 06:02:15,274 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13850, loss[loss=0.08269, beats_loss=0.01311, ecapa_loss=0.0001664, whisper_loss=0.06792, over 13490.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01091, ecapa_loss=0.0001665, whisper_loss=0.09054, over 3824284.15 frames. ], batch size: 54, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:02:18,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2022490.0, ans=0.1 2024-08-13 06:02:18,954 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.80 vs. limit=8.0 2024-08-13 06:02:20,482 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=15.0 2024-08-13 06:02:24,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2022490.0, ans=0.0 2024-08-13 06:02:28,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2022590.0, ans=0.04949747468305833 2024-08-13 06:02:30,695 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 06:02:34,968 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 24 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 06:03:05,603 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 06:03:07,694 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.41 vs. limit=22.5 2024-08-13 06:03:13,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2022890.0, ans=0.1 2024-08-13 06:03:14,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2022890.0, ans=0.0 2024-08-13 06:03:23,376 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 06:03:24,525 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13900, loss[loss=0.108, beats_loss=0.01032, ecapa_loss=0.000176, whisper_loss=0.09588, over 22859.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01086, ecapa_loss=0.000165, whisper_loss=0.09096, over 3862829.15 frames. ], batch size: 93, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:03:31,683 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 06:03:52,130 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=15.0 2024-08-13 06:03:57,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2023190.0, ans=0.09899494936611666 2024-08-13 06:04:00,699 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-13 06:04:03,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2023190.0, ans=0.0 2024-08-13 06:04:11,867 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 06:04:15,468 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.449e+01 2.734e+01 3.123e+01 1.484e+02, threshold=5.468e+01, percent-clipped=1.0 2024-08-13 06:04:33,771 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 13950, loss[loss=0.111, beats_loss=0.009922, ecapa_loss=0.0001707, whisper_loss=0.09938, over 22041.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01085, ecapa_loss=0.0001649, whisper_loss=0.09105, over 3867821.44 frames. ], batch size: 89, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:04:35,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2023490.0, ans=0.2 2024-08-13 06:04:35,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2023490.0, ans=0.0 2024-08-13 06:04:37,443 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2024-08-13 06:04:38,043 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 06:04:46,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2023590.0, ans=0.2 2024-08-13 06:04:48,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2023590.0, ans=0.125 2024-08-13 06:04:49,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2023590.0, ans=0.1 2024-08-13 06:04:53,535 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-08-13 06:05:01,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2023690.0, ans=0.1 2024-08-13 06:05:04,131 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-13 06:05:11,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2023690.0, ans=0.035 2024-08-13 06:05:30,326 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.90 vs. limit=22.5 2024-08-13 06:05:39,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2023890.0, ans=0.125 2024-08-13 06:05:41,535 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 14000, loss[loss=0.101, beats_loss=0.01103, ecapa_loss=0.0001663, whisper_loss=0.08828, over 22007.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01085, ecapa_loss=0.0001646, whisper_loss=0.09147, over 3877561.91 frames. ], batch size: 89, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:05:46,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2023990.0, ans=0.0 2024-08-13 06:06:02,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2024090.0, ans=0.2 2024-08-13 06:06:05,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2024090.0, ans=0.125 2024-08-13 06:06:24,368 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-13 06:06:32,359 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.439e+01 2.688e+01 3.210e+01 4.383e+01, threshold=5.377e+01, percent-clipped=0.0 2024-08-13 06:06:40,762 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 06:06:41,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2024390.0, ans=0.125 2024-08-13 06:06:46,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2024390.0, ans=0.125 2024-08-13 06:06:50,531 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 14050, loss[loss=0.1148, beats_loss=0.0101, ecapa_loss=0.000198, whisper_loss=0.1027, over 14021.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01088, ecapa_loss=0.0001649, whisper_loss=0.09185, over 3846815.70 frames. ], batch size: 55, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:07:24,975 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 06:07:25,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=12.0 2024-08-13 06:07:29,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2024690.0, ans=0.125 2024-08-13 06:07:32,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2024790.0, ans=0.125 2024-08-13 06:07:41,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2024790.0, ans=0.125 2024-08-13 06:07:54,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2024890.0, ans=0.2 2024-08-13 06:07:57,153 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 06:07:59,623 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 14100, loss[loss=0.1082, beats_loss=0.01227, ecapa_loss=0.0001585, whisper_loss=0.09431, over 22555.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01087, ecapa_loss=0.0001647, whisper_loss=0.09182, over 3839745.32 frames. ], batch size: 90, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:08:01,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2024990.0, ans=0.125 2024-08-13 06:08:13,646 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.44 vs. limit=22.5 2024-08-13 06:08:19,773 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 21 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-13 06:08:32,366 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 06:08:38,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2025190.0, ans=15.0 2024-08-13 06:08:42,768 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 06:08:43,597 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.56 vs. limit=10.0 2024-08-13 06:08:47,320 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.19 vs. limit=12.0 2024-08-13 06:08:51,294 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.495e+01 2.684e+01 2.972e+01 8.600e+01, threshold=5.367e+01, percent-clipped=1.0 2024-08-13 06:09:03,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2025390.0, ans=0.125 2024-08-13 06:09:04,968 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.40 vs. limit=15.0 2024-08-13 06:09:09,037 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 14150, loss[loss=0.09987, beats_loss=0.009441, ecapa_loss=0.0001577, whisper_loss=0.08885, over 17411.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01085, ecapa_loss=0.0001649, whisper_loss=0.0919, over 3830563.23 frames. ], batch size: 69, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:09:22,735 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 9 from Vox, 34 fro AS 2024-08-13 06:09:29,837 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 06:09:34,712 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=15.0 2024-08-13 06:09:38,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2025690.0, ans=0.0 2024-08-13 06:09:46,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2025690.0, ans=0.125 2024-08-13 06:09:53,063 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 06:09:57,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2025790.0, ans=0.125 2024-08-13 06:10:12,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2024-08-13 06:10:14,954 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 06:10:17,480 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 14200, loss[loss=0.0849, beats_loss=0.01204, ecapa_loss=0.00018, whisper_loss=0.07107, over 21445.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0109, ecapa_loss=0.0001654, whisper_loss=0.09094, over 3864013.15 frames. ], batch size: 91, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:10:21,671 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-13 06:10:22,448 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.16 vs. limit=15.0 2024-08-13 06:10:39,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2026090.0, ans=0.1 2024-08-13 06:10:40,188 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.83 vs. limit=15.0 2024-08-13 06:10:59,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2026290.0, ans=0.125 2024-08-13 06:11:07,963 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.457e+01 2.666e+01 2.949e+01 5.330e+01, threshold=5.333e+01, percent-clipped=0.0 2024-08-13 06:11:12,712 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.631e+01 2024-08-13 06:11:15,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2026390.0, ans=0.0 2024-08-13 06:11:25,873 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 14250, loss[loss=0.1156, beats_loss=0.009577, ecapa_loss=0.0001756, whisper_loss=0.1042, over 21841.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01083, ecapa_loss=0.0001649, whisper_loss=0.0917, over 3847466.62 frames. ], batch size: 87, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:11:29,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2026490.0, ans=0.0 2024-08-13 06:11:30,792 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.31 vs. limit=12.0 2024-08-13 06:11:34,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2026490.0, ans=0.125 2024-08-13 06:11:44,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2026590.0, ans=0.0 2024-08-13 06:12:00,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2026690.0, ans=0.125 2024-08-13 06:12:12,376 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 06:12:14,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2026790.0, ans=0.1 2024-08-13 06:12:16,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2026790.0, ans=0.1 2024-08-13 06:12:22,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2026890.0, ans=0.125 2024-08-13 06:12:23,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2026890.0, ans=0.07 2024-08-13 06:12:26,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2026890.0, ans=0.2 2024-08-13 06:12:34,316 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 14300, loss[loss=0.1185, beats_loss=0.01015, ecapa_loss=0.0001561, whisper_loss=0.1068, over 14876.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0108, ecapa_loss=0.0001652, whisper_loss=0.09163, over 3859106.62 frames. ], batch size: 56, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:12:44,126 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-13 06:12:58,541 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 37 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 06:13:02,018 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.83 vs. limit=22.5 2024-08-13 06:13:17,908 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 06:13:20,500 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 19 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 06:13:24,262 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.493e+01 2.791e+01 3.138e+01 4.573e+01, threshold=5.581e+01, percent-clipped=0.0 2024-08-13 06:13:41,850 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 14350, loss[loss=0.07633, beats_loss=0.01182, ecapa_loss=0.000185, whisper_loss=0.06266, over 20248.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01086, ecapa_loss=0.0001643, whisper_loss=0.0912, over 3869834.96 frames. ], batch size: 84, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:13:42,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2027490.0, ans=0.5 2024-08-13 06:13:56,630 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=22.5 2024-08-13 06:14:35,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2027790.0, ans=0.1 2024-08-13 06:14:43,147 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2024-08-13 06:14:49,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2027890.0, ans=10.0 2024-08-13 06:14:52,197 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 14400, loss[loss=0.1207, beats_loss=0.008903, ecapa_loss=0.0001577, whisper_loss=0.1102, over 16752.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01084, ecapa_loss=0.0001656, whisper_loss=0.09147, over 3875344.01 frames. ], batch size: 65, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:14:56,475 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 06:15:08,884 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-13 06:15:12,016 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.366e-01 2024-08-13 06:15:32,089 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-13 06:15:45,230 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 33 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 06:15:46,234 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.480e+01 2.712e+01 3.054e+01 1.079e+02, threshold=5.424e+01, percent-clipped=2.0 2024-08-13 06:15:54,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2028390.0, ans=0.0 2024-08-13 06:15:55,032 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.66 vs. limit=12.0 2024-08-13 06:15:57,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2028390.0, ans=0.1 2024-08-13 06:16:06,680 INFO [train_multi_KD3.py:1116] (2/4) Epoch 14, batch 14450, loss[loss=0.111, beats_loss=0.009657, ecapa_loss=0.0001397, whisper_loss=0.09998, over 17575.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01092, ecapa_loss=0.0001652, whisper_loss=0.09132, over 3869521.40 frames. ], batch size: 66, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:16:09,254 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.22 vs. limit=15.0 2024-08-13 06:16:23,832 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.38 vs. limit=15.0 2024-08-13 06:16:27,384 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-13 06:16:29,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2028590.0, ans=0.125 2024-08-13 06:17:03,968 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-13 06:17:52,556 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 0, loss[loss=0.1201, beats_loss=0.008582, ecapa_loss=0.0001966, whisper_loss=0.1095, over 18637.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.008582, ecapa_loss=0.0001966, whisper_loss=0.1095, over 18637.00 frames. ], batch size: 74, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:17:52,557 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-13 06:18:35,128 INFO [train_multi_KD3.py:1149] (2/4) Epoch 15, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005623, whisper_loss=0.2479, over 922467.00 frames. 2024-08-13 06:18:52,072 INFO [train_multi_KD3.py:1149] (2/4) Epoch 15, validation on SV_voxceleb1: loss=0.004582, beats_loss=0, ecapa_loss=0.0004582, whisper_loss=0, over 939242.00 frames. 2024-08-13 06:20:54,487 INFO [train_multi_KD3.py:1149] (2/4) Epoch 15, validation on AT_audioset: loss=0.02384, beats_loss=0.02384, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 06:20:54,489 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-13 06:20:54,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2028930.0, ans=0.125 2024-08-13 06:21:17,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2028930.0, ans=0.2 2024-08-13 06:21:25,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2029030.0, ans=0.0 2024-08-13 06:21:31,900 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 06:21:34,144 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.49 vs. limit=12.0 2024-08-13 06:22:08,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2029130.0, ans=0.0 2024-08-13 06:22:10,803 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 06:22:21,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2029230.0, ans=0.0 2024-08-13 06:22:47,931 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.60 vs. limit=10.0 2024-08-13 06:22:48,918 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.538e+01 2.901e+01 3.195e+01 5.923e+01, threshold=5.802e+01, percent-clipped=1.0 2024-08-13 06:23:02,958 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 22 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-13 06:23:05,528 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 50, loss[loss=0.08734, beats_loss=0.01401, ecapa_loss=0.0001733, whisper_loss=0.07159, over 22082.00 frames. ], tot_loss[loss=0.09975, beats_loss=0.0105, ecapa_loss=0.0001735, whisper_loss=0.08751, over 897435.43 frames. ], batch size: 94, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:23:06,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2029430.0, ans=0.125 2024-08-13 06:23:11,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2029430.0, ans=0.125 2024-08-13 06:23:53,173 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.86 vs. limit=22.5 2024-08-13 06:23:54,849 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 06:24:31,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2029730.0, ans=0.2 2024-08-13 06:24:45,329 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.76 vs. limit=10.0 2024-08-13 06:24:47,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2029830.0, ans=0.2 2024-08-13 06:24:49,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2029830.0, ans=0.0 2024-08-13 06:25:04,420 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 100, loss[loss=0.08216, beats_loss=0.01185, ecapa_loss=0.0001442, whisper_loss=0.06887, over 19433.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01005, ecapa_loss=0.0001727, whisper_loss=0.08993, over 1558723.95 frames. ], batch size: 77, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:25:07,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2029930.0, ans=0.125 2024-08-13 06:25:09,100 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 06:25:44,273 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-13 06:25:51,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2030130.0, ans=0.125 2024-08-13 06:25:59,754 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-13 06:26:04,409 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-13 06:26:42,727 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.792e+01 3.150e+01 3.564e+01 5.697e+01, threshold=6.299e+01, percent-clipped=0.0 2024-08-13 06:26:47,696 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=12.0 2024-08-13 06:26:56,778 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 150, loss[loss=0.09918, beats_loss=0.01121, ecapa_loss=0.0001843, whisper_loss=0.08613, over 23112.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01007, ecapa_loss=0.0001697, whisper_loss=0.09054, over 2078036.18 frames. ], batch size: 94, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:26:57,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2030430.0, ans=0.125 2024-08-13 06:27:13,690 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 06:27:39,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2030630.0, ans=0.0 2024-08-13 06:28:27,540 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 200, loss[loss=0.09427, beats_loss=0.01003, ecapa_loss=0.0001698, whisper_loss=0.08255, over 19916.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01011, ecapa_loss=0.0001708, whisper_loss=0.08953, over 2420425.96 frames. ], batch size: 83, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:28:57,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2031030.0, ans=0.0 2024-08-13 06:29:00,117 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-13 06:29:07,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2031130.0, ans=0.125 2024-08-13 06:29:17,105 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 19 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 06:29:21,625 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.67 vs. limit=22.5 2024-08-13 06:29:26,958 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-13 06:29:39,730 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.892e+01 2.436e+01 2.755e+01 3.099e+01 4.760e+01, threshold=5.509e+01, percent-clipped=0.0 2024-08-13 06:29:41,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2031330.0, ans=0.125 2024-08-13 06:29:51,978 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 250, loss[loss=0.1239, beats_loss=0.007361, ecapa_loss=0.0001555, whisper_loss=0.115, over 16979.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01023, ecapa_loss=0.0001683, whisper_loss=0.09066, over 2713723.78 frames. ], batch size: 63, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:29:57,641 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 06:30:07,499 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.03 vs. limit=22.5 2024-08-13 06:30:12,569 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.93 vs. limit=10.0 2024-08-13 06:30:32,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2031630.0, ans=0.125 2024-08-13 06:30:58,700 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.67 vs. limit=15.0 2024-08-13 06:31:07,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2031830.0, ans=0.125 2024-08-13 06:31:08,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2031830.0, ans=0.0 2024-08-13 06:31:13,738 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 300, loss[loss=0.09247, beats_loss=0.01028, ecapa_loss=0.0001735, whisper_loss=0.08045, over 16647.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0104, ecapa_loss=0.0001668, whisper_loss=0.09074, over 2950386.08 frames. ], batch size: 61, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:31:18,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2031930.0, ans=0.05 2024-08-13 06:31:19,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2031930.0, ans=0.0 2024-08-13 06:31:19,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2031930.0, ans=0.1 2024-08-13 06:31:29,629 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2024-08-13 06:31:46,595 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 06:31:48,631 WARNING [optim.py:496] (2/4) Scaling gradients by 0.06791721284389496, model_norm_threshold=55.09401321411133 2024-08-13 06:31:48,819 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.98, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.429e+05, grad_sumsq=7.164e+04, orig_rms_sq=8.974e+00 2024-08-13 06:32:01,767 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 15 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 06:32:11,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2032230.0, ans=0.1 2024-08-13 06:32:15,487 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 06:32:25,763 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.443e+01 2.713e+01 2.990e+01 8.112e+02, threshold=5.427e+01, percent-clipped=1.0 2024-08-13 06:32:28,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2032330.0, ans=0.0 2024-08-13 06:32:37,010 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 350, loss[loss=0.1219, beats_loss=0.008553, ecapa_loss=0.0002081, whisper_loss=0.1113, over 20623.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.0001674, whisper_loss=0.09053, over 3133109.72 frames. ], batch size: 85, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:32:56,432 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.50 vs. limit=22.5 2024-08-13 06:32:56,711 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=22.5 2024-08-13 06:33:05,118 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 26 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 06:33:12,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2032630.0, ans=0.1 2024-08-13 06:33:26,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2032730.0, ans=0.0 2024-08-13 06:33:38,508 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 06:33:57,761 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 400, loss[loss=0.0847, beats_loss=0.008789, ecapa_loss=0.0002024, whisper_loss=0.07388, over 15459.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.0001674, whisper_loss=0.0906, over 3296986.73 frames. ], batch size: 63, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:34:00,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2032930.0, ans=0.05 2024-08-13 06:34:07,702 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 06:34:27,201 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 19 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-13 06:34:28,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2033030.0, ans=0.1 2024-08-13 06:34:52,864 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-13 06:35:06,745 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.548e+01 2.826e+01 3.113e+01 9.410e+01, threshold=5.653e+01, percent-clipped=3.0 2024-08-13 06:35:17,994 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 450, loss[loss=0.1071, beats_loss=0.009947, ecapa_loss=0.0001627, whisper_loss=0.09549, over 22607.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01055, ecapa_loss=0.0001669, whisper_loss=0.08973, over 3415709.29 frames. ], batch size: 91, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:35:19,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2033430.0, ans=0.0 2024-08-13 06:35:20,933 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 06:35:21,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2033430.0, ans=0.0 2024-08-13 06:35:41,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2033530.0, ans=0.125 2024-08-13 06:36:34,221 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 06:36:35,868 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 06:36:37,347 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 500, loss[loss=0.1029, beats_loss=0.01184, ecapa_loss=0.0002115, whisper_loss=0.08899, over 13361.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.0001659, whisper_loss=0.0895, over 3504159.46 frames. ], batch size: 55, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:36:45,880 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2024-08-13 06:36:53,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2034030.0, ans=0.2 2024-08-13 06:37:01,056 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 06:37:07,156 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.72 vs. limit=22.5 2024-08-13 06:37:10,119 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 33 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 06:37:13,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2034130.0, ans=0.125 2024-08-13 06:37:13,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.69 vs. limit=10.0 2024-08-13 06:37:14,079 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.66 vs. limit=15.0 2024-08-13 06:37:21,840 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2024-08-13 06:37:27,817 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-13 06:37:45,259 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.386e+01 2.704e+01 2.981e+01 6.756e+01, threshold=5.408e+01, percent-clipped=1.0 2024-08-13 06:37:51,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2034330.0, ans=0.125 2024-08-13 06:37:56,435 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 550, loss[loss=0.1009, beats_loss=0.01147, ecapa_loss=0.0001372, whisper_loss=0.08809, over 20260.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01061, ecapa_loss=0.0001645, whisper_loss=0.08957, over 3593633.02 frames. ], batch size: 78, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:37:56,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2034430.0, ans=0.125 2024-08-13 06:37:58,324 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 06:38:00,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-08-13 06:38:14,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2034530.0, ans=0.0 2024-08-13 06:38:16,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2034530.0, ans=0.2 2024-08-13 06:38:21,469 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2024-08-13 06:38:38,530 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 06:38:40,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2034630.0, ans=0.09899494936611666 2024-08-13 06:39:17,196 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 600, loss[loss=0.1042, beats_loss=0.009949, ecapa_loss=0.0001709, whisper_loss=0.09258, over 21732.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0106, ecapa_loss=0.0001643, whisper_loss=0.09032, over 3633175.05 frames. ], batch size: 88, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:39:48,862 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 06:40:00,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2035130.0, ans=0.0 2024-08-13 06:40:01,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2035130.0, ans=0.125 2024-08-13 06:40:12,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2035230.0, ans=0.125 2024-08-13 06:40:13,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2035230.0, ans=0.2 2024-08-13 06:40:25,643 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.434e+01 2.721e+01 3.072e+01 6.546e+01, threshold=5.441e+01, percent-clipped=1.0 2024-08-13 06:40:36,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2035430.0, ans=0.07 2024-08-13 06:40:37,796 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 650, loss[loss=0.09504, beats_loss=0.01316, ecapa_loss=0.0001434, whisper_loss=0.08044, over 16602.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.000165, whisper_loss=0.09076, over 3688902.24 frames. ], batch size: 62, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:40:52,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2035430.0, ans=0.0 2024-08-13 06:40:54,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2035530.0, ans=0.125 2024-08-13 06:41:06,513 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 25 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-13 06:41:21,621 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 06:41:23,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2035630.0, ans=0.125 2024-08-13 06:41:36,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2035730.0, ans=0.0 2024-08-13 06:41:40,939 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 06:41:43,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2035830.0, ans=0.125 2024-08-13 06:41:59,043 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 700, loss[loss=0.1249, beats_loss=0.01279, ecapa_loss=0.0001175, whisper_loss=0.111, over 16263.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01063, ecapa_loss=0.0001631, whisper_loss=0.09104, over 3708493.23 frames. ], batch size: 61, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:42:03,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2035930.0, ans=0.1 2024-08-13 06:42:03,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2035930.0, ans=0.0 2024-08-13 06:42:17,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2036030.0, ans=0.0 2024-08-13 06:42:26,509 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 06:42:35,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2036130.0, ans=0.1 2024-08-13 06:42:40,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2036130.0, ans=0.2 2024-08-13 06:42:41,435 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 06:42:41,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2036130.0, ans=0.125 2024-08-13 06:42:54,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2036230.0, ans=0.125 2024-08-13 06:42:56,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2036230.0, ans=0.1 2024-08-13 06:42:59,151 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 06:43:00,826 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 06:43:08,488 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.352e+01 2.612e+01 3.001e+01 5.116e+01, threshold=5.224e+01, percent-clipped=0.0 2024-08-13 06:43:13,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2036330.0, ans=0.04949747468305833 2024-08-13 06:43:19,768 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 750, loss[loss=0.07732, beats_loss=0.01161, ecapa_loss=0.0001426, whisper_loss=0.06428, over 15318.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01074, ecapa_loss=0.0001622, whisper_loss=0.09014, over 3732605.66 frames. ], batch size: 61, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:43:27,873 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.85 vs. limit=15.0 2024-08-13 06:43:43,545 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 06:43:45,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2036530.0, ans=0.125 2024-08-13 06:43:50,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2036630.0, ans=0.5 2024-08-13 06:44:06,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2036730.0, ans=0.125 2024-08-13 06:44:12,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2036730.0, ans=0.04949747468305833 2024-08-13 06:44:25,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2036830.0, ans=0.125 2024-08-13 06:44:28,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2036830.0, ans=0.2 2024-08-13 06:44:32,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2036830.0, ans=0.2 2024-08-13 06:44:37,081 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 800, loss[loss=0.08334, beats_loss=0.01063, ecapa_loss=0.0001505, whisper_loss=0.0712, over 18712.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01083, ecapa_loss=0.0001616, whisper_loss=0.08987, over 3776903.83 frames. ], batch size: 72, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:44:38,497 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 06:44:40,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2036930.0, ans=0.125 2024-08-13 06:44:41,779 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 06:44:53,323 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 06:45:00,044 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.32 vs. limit=22.5 2024-08-13 06:45:08,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2037130.0, ans=0.125 2024-08-13 06:45:08,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2037130.0, ans=0.07 2024-08-13 06:45:33,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2037230.0, ans=0.125 2024-08-13 06:45:36,027 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-13 06:45:43,221 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.386e+01 2.631e+01 2.954e+01 1.989e+02, threshold=5.262e+01, percent-clipped=2.0 2024-08-13 06:45:43,764 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 06:45:53,631 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 850, loss[loss=0.09121, beats_loss=0.007554, ecapa_loss=0.0002165, whisper_loss=0.08149, over 17394.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01075, ecapa_loss=0.0001627, whisper_loss=0.08967, over 3759877.16 frames. ], batch size: 75, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:45:54,079 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.021e+01 2024-08-13 06:46:03,460 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-13 06:46:24,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2037630.0, ans=0.0 2024-08-13 06:46:25,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2037630.0, ans=0.04949747468305833 2024-08-13 06:46:34,687 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-13 06:46:36,590 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2037630.0, ans=0.2 2024-08-13 06:46:45,015 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-13 06:46:45,567 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2024-08-13 06:46:50,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.69 vs. limit=22.5 2024-08-13 06:46:56,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2037830.0, ans=0.125 2024-08-13 06:46:57,861 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 14 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 06:47:10,172 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 900, loss[loss=0.106, beats_loss=0.01137, ecapa_loss=0.000151, whisper_loss=0.09307, over 23814.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01073, ecapa_loss=0.0001623, whisper_loss=0.08996, over 3737304.49 frames. ], batch size: 93, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:47:20,301 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 06:47:41,969 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-13 06:47:59,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2038230.0, ans=0.0 2024-08-13 06:48:11,895 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.95 vs. limit=10.0 2024-08-13 06:48:12,455 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.393e+01 2.649e+01 3.126e+01 8.192e+01, threshold=5.298e+01, percent-clipped=1.0 2024-08-13 06:48:22,625 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 950, loss[loss=0.1191, beats_loss=0.009307, ecapa_loss=0.0001605, whisper_loss=0.1082, over 23241.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01075, ecapa_loss=0.0001612, whisper_loss=0.08994, over 3747933.73 frames. ], batch size: 88, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:48:22,721 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-13 06:48:39,225 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 06:48:59,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2038630.0, ans=0.125 2024-08-13 06:48:59,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2038630.0, ans=0.0 2024-08-13 06:49:01,520 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 06:49:11,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2038730.0, ans=0.0 2024-08-13 06:49:13,167 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.17 vs. limit=22.5 2024-08-13 06:49:19,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2038730.0, ans=0.125 2024-08-13 06:49:28,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.49 vs. limit=10.0 2024-08-13 06:49:32,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2038830.0, ans=0.125 2024-08-13 06:49:44,434 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1000, loss[loss=0.1248, beats_loss=0.009929, ecapa_loss=0.0001604, whisper_loss=0.1132, over 21445.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0108, ecapa_loss=0.0001617, whisper_loss=0.08996, over 3739696.02 frames. ], batch size: 83, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:49:51,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2038930.0, ans=0.1 2024-08-13 06:49:54,006 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-08-13 06:49:56,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2038930.0, ans=0.0 2024-08-13 06:49:57,681 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 26 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-13 06:49:58,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2038930.0, ans=0.0 2024-08-13 06:50:20,388 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 06:50:22,824 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.87 vs. limit=22.5 2024-08-13 06:50:37,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2039230.0, ans=0.125 2024-08-13 06:50:55,675 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.424e+01 2.734e+01 3.160e+01 9.771e+01, threshold=5.467e+01, percent-clipped=3.0 2024-08-13 06:51:05,539 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1050, loss[loss=0.09955, beats_loss=0.0106, ecapa_loss=0.0001948, whisper_loss=0.087, over 21648.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0108, ecapa_loss=0.0001622, whisper_loss=0.08933, over 3733226.13 frames. ], batch size: 92, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:51:32,690 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.25 vs. limit=15.0 2024-08-13 06:51:32,864 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2024-08-13 06:51:59,361 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.02 vs. limit=22.5 2024-08-13 06:52:02,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2039730.0, ans=0.125 2024-08-13 06:52:12,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2039830.0, ans=0.05 2024-08-13 06:52:20,558 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1100, loss[loss=0.0852, beats_loss=0.01308, ecapa_loss=0.0001573, whisper_loss=0.07055, over 20319.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01081, ecapa_loss=0.0001626, whisper_loss=0.08919, over 3753838.78 frames. ], batch size: 84, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:52:24,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2039930.0, ans=0.0 2024-08-13 06:52:26,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2039930.0, ans=0.0 2024-08-13 06:52:36,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2040030.0, ans=0.0 2024-08-13 06:52:36,870 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.42 vs. limit=15.0 2024-08-13 06:52:59,302 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.21 vs. limit=22.5 2024-08-13 06:53:08,877 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-13 06:53:24,785 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.366e+01 2.661e+01 3.055e+01 5.230e+01, threshold=5.322e+01, percent-clipped=0.0 2024-08-13 06:53:29,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2040330.0, ans=0.125 2024-08-13 06:53:32,927 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1150, loss[loss=0.09336, beats_loss=0.0116, ecapa_loss=0.0001514, whisper_loss=0.08025, over 19168.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01075, ecapa_loss=0.0001634, whisper_loss=0.08918, over 3766589.87 frames. ], batch size: 74, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:53:39,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2040430.0, ans=0.0 2024-08-13 06:54:07,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2040630.0, ans=0.0 2024-08-13 06:54:16,386 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 06:54:19,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2040730.0, ans=0.0 2024-08-13 06:54:19,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=2040730.0, ans=0.05 2024-08-13 06:54:27,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2040730.0, ans=0.0 2024-08-13 06:54:36,973 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.19 vs. limit=12.0 2024-08-13 06:54:42,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2040830.0, ans=0.0 2024-08-13 06:54:44,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2040930.0, ans=0.0 2024-08-13 06:54:44,976 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1200, loss[loss=0.1008, beats_loss=0.008194, ecapa_loss=0.000205, whisper_loss=0.09056, over 22468.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01076, ecapa_loss=0.0001619, whisper_loss=0.09034, over 3766715.94 frames. ], batch size: 91, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:54:46,466 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 32 from Vox, 27 fro AS 2024-08-13 06:54:54,663 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-13 06:54:55,865 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 06:55:01,297 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.78 vs. limit=22.5 2024-08-13 06:55:21,314 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.10 vs. limit=10.0 2024-08-13 06:55:31,649 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.64 vs. limit=15.0 2024-08-13 06:55:36,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2041230.0, ans=0.125 2024-08-13 06:55:46,663 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.369e+01 2.676e+01 3.078e+01 7.518e+01, threshold=5.351e+01, percent-clipped=1.0 2024-08-13 06:55:51,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2041330.0, ans=0.1 2024-08-13 06:55:54,720 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1250, loss[loss=0.09503, beats_loss=0.0111, ecapa_loss=0.000177, whisper_loss=0.08216, over 14106.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01071, ecapa_loss=0.000164, whisper_loss=0.09012, over 3740692.75 frames. ], batch size: 54, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:56:00,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2041430.0, ans=0.125 2024-08-13 06:56:20,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2041630.0, ans=0.04949747468305833 2024-08-13 06:56:22,547 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 06:56:40,944 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2024-08-13 06:56:52,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2041830.0, ans=0.0 2024-08-13 06:56:59,981 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 06:57:01,242 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1300, loss[loss=0.0904, beats_loss=0.01227, ecapa_loss=0.0002322, whisper_loss=0.07581, over 18375.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01073, ecapa_loss=0.0001633, whisper_loss=0.09066, over 3785430.30 frames. ], batch size: 82, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:57:13,458 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 06:57:16,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2042030.0, ans=0.1 2024-08-13 06:57:23,045 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-13 06:57:28,277 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.28 vs. limit=22.5 2024-08-13 06:57:31,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2042130.0, ans=0.0 2024-08-13 06:57:39,550 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 06:57:40,173 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2024-08-13 06:57:45,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2042230.0, ans=0.125 2024-08-13 06:57:57,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2042330.0, ans=0.125 2024-08-13 06:57:57,326 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2024-08-13 06:57:59,130 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.323e+01 2.630e+01 3.145e+01 6.794e+01, threshold=5.259e+01, percent-clipped=2.0 2024-08-13 06:57:59,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2042330.0, ans=0.05 2024-08-13 06:58:04,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2042330.0, ans=0.125 2024-08-13 06:58:07,134 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1350, loss[loss=0.09253, beats_loss=0.008052, ecapa_loss=0.000185, whisper_loss=0.08263, over 20267.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01071, ecapa_loss=0.0001639, whisper_loss=0.09104, over 3803030.07 frames. ], batch size: 79, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:58:12,190 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 06:58:14,932 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 06:58:18,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2042430.0, ans=0.0 2024-08-13 06:58:28,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2042530.0, ans=0.125 2024-08-13 06:58:33,573 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 06:58:36,075 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-13 06:58:51,670 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-13 06:58:54,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2042730.0, ans=0.0 2024-08-13 06:58:58,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2042830.0, ans=0.1 2024-08-13 06:59:00,727 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=22.5 2024-08-13 06:59:06,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2042830.0, ans=0.125 2024-08-13 06:59:08,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2042830.0, ans=0.1 2024-08-13 06:59:12,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2042930.0, ans=0.125 2024-08-13 06:59:13,146 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1400, loss[loss=0.1173, beats_loss=0.01118, ecapa_loss=0.0001624, whisper_loss=0.1045, over 22222.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01071, ecapa_loss=0.0001626, whisper_loss=0.09144, over 3836481.85 frames. ], batch size: 89, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:59:17,471 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 06:59:19,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2042930.0, ans=0.0 2024-08-13 06:59:44,003 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.39 vs. limit=10.0 2024-08-13 07:00:12,635 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.355e+01 2.665e+01 2.989e+01 4.736e+01, threshold=5.330e+01, percent-clipped=0.0 2024-08-13 07:00:14,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2043330.0, ans=0.2 2024-08-13 07:00:14,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.82 vs. limit=12.0 2024-08-13 07:00:20,724 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1450, loss[loss=0.09677, beats_loss=0.01313, ecapa_loss=0.0001651, whisper_loss=0.08199, over 18942.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0107, ecapa_loss=0.0001625, whisper_loss=0.09158, over 3830206.20 frames. ], batch size: 76, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:00:44,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2043430.0, ans=0.07 2024-08-13 07:00:46,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2043430.0, ans=0.5 2024-08-13 07:01:08,772 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 07:01:11,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2043630.0, ans=0.07 2024-08-13 07:01:20,751 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.409e-01 2024-08-13 07:01:26,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2043730.0, ans=0.125 2024-08-13 07:01:51,733 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1500, loss[loss=0.1055, beats_loss=0.009346, ecapa_loss=0.0001746, whisper_loss=0.09441, over 22792.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01071, ecapa_loss=0.0001625, whisper_loss=0.09084, over 3850522.76 frames. ], batch size: 89, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:02:47,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2044330.0, ans=0.125 2024-08-13 07:02:51,413 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.422e+01 2.612e+01 2.997e+01 7.275e+01, threshold=5.223e+01, percent-clipped=1.0 2024-08-13 07:02:59,617 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1550, loss[loss=0.11, beats_loss=0.006141, ecapa_loss=0.0002182, whisper_loss=0.1017, over 17695.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.0001618, whisper_loss=0.09078, over 3819066.38 frames. ], batch size: 70, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:02:59,803 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-13 07:03:12,397 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 07:03:30,817 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 07:03:31,217 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.35 vs. limit=22.5 2024-08-13 07:03:33,675 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-13 07:03:38,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2044630.0, ans=0.95 2024-08-13 07:03:42,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2044730.0, ans=0.125 2024-08-13 07:03:49,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2044730.0, ans=0.1 2024-08-13 07:03:56,706 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.20 vs. limit=22.5 2024-08-13 07:04:08,975 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.39 vs. limit=22.5 2024-08-13 07:04:09,473 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1600, loss[loss=0.1047, beats_loss=0.009677, ecapa_loss=0.0001665, whisper_loss=0.09332, over 22950.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01066, ecapa_loss=0.0001612, whisper_loss=0.09125, over 3835320.70 frames. ], batch size: 93, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:04:11,298 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 07:04:12,676 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-13 07:05:03,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2045230.0, ans=0.025 2024-08-13 07:05:07,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2045330.0, ans=0.2 2024-08-13 07:05:10,967 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.418e+01 2.670e+01 2.986e+01 1.271e+02, threshold=5.339e+01, percent-clipped=4.0 2024-08-13 07:05:16,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2045330.0, ans=0.0 2024-08-13 07:05:20,010 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1650, loss[loss=0.08719, beats_loss=0.00957, ecapa_loss=0.0001784, whisper_loss=0.07583, over 18618.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01071, ecapa_loss=0.0001612, whisper_loss=0.09159, over 3842805.41 frames. ], batch size: 75, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:05:25,788 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.77 vs. limit=22.5 2024-08-13 07:05:51,721 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 07:05:58,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2045630.0, ans=0.125 2024-08-13 07:06:01,021 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.35 vs. limit=15.0 2024-08-13 07:06:17,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2045830.0, ans=0.2 2024-08-13 07:06:29,152 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1700, loss[loss=0.09296, beats_loss=0.007186, ecapa_loss=0.0001694, whisper_loss=0.08408, over 13377.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01071, ecapa_loss=0.000161, whisper_loss=0.09137, over 3829065.58 frames. ], batch size: 53, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:06:40,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2045930.0, ans=0.0 2024-08-13 07:06:55,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2046030.0, ans=0.05 2024-08-13 07:07:04,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2046130.0, ans=0.0 2024-08-13 07:07:06,161 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-13 07:07:12,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2046230.0, ans=0.1 2024-08-13 07:07:18,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2046230.0, ans=0.0 2024-08-13 07:07:22,943 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.55 vs. limit=15.0 2024-08-13 07:07:31,365 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.428e+01 2.659e+01 3.089e+01 1.627e+02, threshold=5.319e+01, percent-clipped=1.0 2024-08-13 07:07:35,885 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 07:07:36,225 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 07:07:36,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2046330.0, ans=0.0 2024-08-13 07:07:39,817 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1750, loss[loss=0.0953, beats_loss=0.01056, ecapa_loss=0.0001516, whisper_loss=0.08322, over 19458.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01071, ecapa_loss=0.0001599, whisper_loss=0.09117, over 3879592.94 frames. ], batch size: 76, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:07:48,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2046430.0, ans=0.125 2024-08-13 07:08:08,243 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 16 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-13 07:08:11,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2046630.0, ans=0.0 2024-08-13 07:08:22,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2046730.0, ans=0.0 2024-08-13 07:08:23,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2046730.0, ans=0.1 2024-08-13 07:08:47,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2046830.0, ans=0.125 2024-08-13 07:08:48,871 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.08 vs. limit=12.0 2024-08-13 07:08:49,386 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1800, loss[loss=0.1175, beats_loss=0.009885, ecapa_loss=0.0001509, whisper_loss=0.1061, over 22397.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01078, ecapa_loss=0.0001586, whisper_loss=0.0907, over 3867428.46 frames. ], batch size: 88, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:08:57,528 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2024-08-13 07:08:58,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2046930.0, ans=0.125 2024-08-13 07:09:02,554 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-13 07:09:02,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2047030.0, ans=0.0 2024-08-13 07:09:07,929 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 22 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-13 07:09:09,227 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 07:09:29,224 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.81 vs. limit=22.5 2024-08-13 07:09:33,948 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 36 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 07:09:34,759 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 07:09:40,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2047230.0, ans=0.125 2024-08-13 07:09:50,545 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 07:09:51,691 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.458e+01 2.695e+01 3.131e+01 5.479e+01, threshold=5.391e+01, percent-clipped=1.0 2024-08-13 07:09:59,906 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1850, loss[loss=0.08479, beats_loss=0.01081, ecapa_loss=0.0001546, whisper_loss=0.07244, over 17256.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01081, ecapa_loss=0.0001582, whisper_loss=0.09086, over 3848975.38 frames. ], batch size: 69, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:10:06,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2047430.0, ans=0.2 2024-08-13 07:10:06,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2047430.0, ans=0.04949747468305833 2024-08-13 07:10:08,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2047430.0, ans=0.09899494936611666 2024-08-13 07:10:13,941 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 07:10:32,680 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 16 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 07:10:45,658 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=12.0 2024-08-13 07:10:47,880 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 12 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-13 07:11:08,074 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1900, loss[loss=0.09771, beats_loss=0.01366, ecapa_loss=0.0001468, whisper_loss=0.08259, over 17702.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01077, ecapa_loss=0.0001598, whisper_loss=0.09078, over 3805041.70 frames. ], batch size: 73, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:11:27,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2048030.0, ans=0.125 2024-08-13 07:11:32,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2048030.0, ans=0.5 2024-08-13 07:11:55,340 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-13 07:12:02,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2048330.0, ans=0.125 2024-08-13 07:12:02,996 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2024-08-13 07:12:09,309 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.366e+01 2.636e+01 3.036e+01 8.197e+01, threshold=5.272e+01, percent-clipped=3.0 2024-08-13 07:12:11,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2048330.0, ans=0.125 2024-08-13 07:12:17,852 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 1950, loss[loss=0.09454, beats_loss=0.01091, ecapa_loss=0.0001504, whisper_loss=0.08213, over 17045.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01071, ecapa_loss=0.0001622, whisper_loss=0.0907, over 3787727.05 frames. ], batch size: 68, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:12:34,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2048530.0, ans=0.0 2024-08-13 07:12:34,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2048530.0, ans=0.2 2024-08-13 07:12:53,942 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-13 07:13:09,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=2048730.0, ans=22.5 2024-08-13 07:13:23,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2048830.0, ans=0.125 2024-08-13 07:13:33,749 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2000, loss[loss=0.08654, beats_loss=0.01436, ecapa_loss=0.0001385, whisper_loss=0.0708, over 19578.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0108, ecapa_loss=0.0001609, whisper_loss=0.09019, over 3813241.14 frames. ], batch size: 85, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:13:48,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2049030.0, ans=0.1 2024-08-13 07:13:48,974 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2024-08-13 07:13:51,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2049030.0, ans=0.125 2024-08-13 07:14:16,251 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 07:14:27,301 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 07:14:42,728 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.431e+01 2.683e+01 2.951e+01 6.273e+01, threshold=5.366e+01, percent-clipped=2.0 2024-08-13 07:14:49,468 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.87 vs. limit=15.0 2024-08-13 07:14:50,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2049430.0, ans=0.0 2024-08-13 07:14:51,652 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2050, loss[loss=0.1022, beats_loss=0.01163, ecapa_loss=0.0001701, whisper_loss=0.08883, over 22987.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01082, ecapa_loss=0.0001613, whisper_loss=0.09017, over 3811299.54 frames. ], batch size: 93, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:15:15,754 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.12 vs. limit=15.0 2024-08-13 07:15:18,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2049530.0, ans=0.125 2024-08-13 07:15:40,923 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 07:15:44,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2049730.0, ans=0.0 2024-08-13 07:15:58,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2049830.0, ans=0.0 2024-08-13 07:16:01,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2049830.0, ans=0.0 2024-08-13 07:16:08,706 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2100, loss[loss=0.09388, beats_loss=0.01349, ecapa_loss=0.000128, whisper_loss=0.07911, over 20096.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01081, ecapa_loss=0.0001607, whisper_loss=0.09025, over 3813651.93 frames. ], batch size: 81, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:16:12,097 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.42 vs. limit=22.5 2024-08-13 07:16:16,558 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 07:16:22,425 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=15.0 2024-08-13 07:16:22,590 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=15.0 2024-08-13 07:16:34,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2024-08-13 07:16:36,859 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 07:16:50,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2050130.0, ans=0.2 2024-08-13 07:16:57,660 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.51 vs. limit=15.0 2024-08-13 07:17:00,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2050230.0, ans=0.125 2024-08-13 07:17:15,210 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.374e+01 2.616e+01 2.948e+01 7.626e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-13 07:17:18,470 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 07:17:21,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2050330.0, ans=0.1 2024-08-13 07:17:24,458 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2150, loss[loss=0.1138, beats_loss=0.008944, ecapa_loss=0.0001563, whisper_loss=0.1033, over 22435.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01081, ecapa_loss=0.0001623, whisper_loss=0.09041, over 3824117.45 frames. ], batch size: 85, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:17:38,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2050530.0, ans=0.1 2024-08-13 07:18:05,785 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 07:18:19,518 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-08-13 07:18:29,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2050830.0, ans=0.125 2024-08-13 07:18:37,854 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2200, loss[loss=0.09639, beats_loss=0.01375, ecapa_loss=0.0001336, whisper_loss=0.08131, over 19436.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01081, ecapa_loss=0.0001611, whisper_loss=0.09156, over 3825945.46 frames. ], batch size: 75, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:19:11,010 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2024-08-13 07:19:18,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2051130.0, ans=0.125 2024-08-13 07:19:32,870 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-13 07:19:34,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2051230.0, ans=0.1 2024-08-13 07:19:37,228 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 07:19:41,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2051330.0, ans=0.125 2024-08-13 07:19:43,400 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.405e+01 2.692e+01 3.101e+01 3.996e+01, threshold=5.385e+01, percent-clipped=0.0 2024-08-13 07:19:53,002 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2250, loss[loss=0.1174, beats_loss=0.01465, ecapa_loss=0.00011, whisper_loss=0.1017, over 21580.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01091, ecapa_loss=0.0001614, whisper_loss=0.09136, over 3825262.03 frames. ], batch size: 82, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:19:58,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2051430.0, ans=0.125 2024-08-13 07:20:00,687 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 31 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 07:20:02,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2051430.0, ans=0.125 2024-08-13 07:20:05,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2051430.0, ans=0.1 2024-08-13 07:20:20,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2051530.0, ans=0.0 2024-08-13 07:20:31,221 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 07:20:34,779 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 07:20:35,562 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.16 vs. limit=15.0 2024-08-13 07:20:38,397 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-08-13 07:20:39,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2051730.0, ans=0.1 2024-08-13 07:20:42,081 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 07:20:45,713 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 9 from Vox, 45 fro AS 2024-08-13 07:20:57,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2051830.0, ans=0.125 2024-08-13 07:21:10,958 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2300, loss[loss=0.09428, beats_loss=0.0128, ecapa_loss=0.0001588, whisper_loss=0.07989, over 22715.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01096, ecapa_loss=0.0001625, whisper_loss=0.09157, over 3845939.28 frames. ], batch size: 94, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:21:16,483 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.92 vs. limit=8.0 2024-08-13 07:21:29,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2052030.0, ans=0.125 2024-08-13 07:21:33,073 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-13 07:21:40,353 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.26 vs. limit=10.0 2024-08-13 07:21:44,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2052130.0, ans=0.0 2024-08-13 07:21:49,213 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2024-08-13 07:21:55,262 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 07:21:56,193 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 07:21:59,709 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 14 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 07:22:14,044 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 26 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-13 07:22:17,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2052330.0, ans=0.125 2024-08-13 07:22:20,168 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.551e+01 2.802e+01 3.286e+01 4.961e+01, threshold=5.604e+01, percent-clipped=0.0 2024-08-13 07:22:25,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2052330.0, ans=0.0 2024-08-13 07:22:27,880 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2350, loss[loss=0.1046, beats_loss=0.01099, ecapa_loss=0.0001956, whisper_loss=0.09168, over 17942.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01091, ecapa_loss=0.0001635, whisper_loss=0.09204, over 3850175.64 frames. ], batch size: 72, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:23:04,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2052630.0, ans=0.0 2024-08-13 07:23:04,695 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-13 07:23:23,166 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 07:23:26,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2052730.0, ans=0.125 2024-08-13 07:23:32,093 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-13 07:23:34,895 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 07:23:36,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2052830.0, ans=0.125 2024-08-13 07:23:43,123 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2400, loss[loss=0.08721, beats_loss=0.01113, ecapa_loss=0.0001949, whisper_loss=0.07413, over 13604.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01081, ecapa_loss=0.0001634, whisper_loss=0.09203, over 3861748.36 frames. ], batch size: 59, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:24:20,504 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-13 07:24:51,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2053330.0, ans=0.5 2024-08-13 07:24:52,566 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.349e+01 2.648e+01 3.316e+01 5.305e+01, threshold=5.296e+01, percent-clipped=0.0 2024-08-13 07:24:54,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2053330.0, ans=0.0 2024-08-13 07:25:00,252 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2450, loss[loss=0.09256, beats_loss=0.01033, ecapa_loss=0.0001613, whisper_loss=0.08062, over 15270.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01085, ecapa_loss=0.0001644, whisper_loss=0.09148, over 3842044.29 frames. ], batch size: 60, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:25:02,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2053430.0, ans=0.0 2024-08-13 07:25:21,728 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-13 07:25:34,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2053630.0, ans=0.09899494936611666 2024-08-13 07:25:42,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2053630.0, ans=0.0 2024-08-13 07:25:58,071 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-13 07:26:08,882 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2024-08-13 07:26:16,244 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2500, loss[loss=0.1199, beats_loss=0.00797, ecapa_loss=0.0001765, whisper_loss=0.1102, over 18199.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01077, ecapa_loss=0.0001656, whisper_loss=0.09199, over 3865251.99 frames. ], batch size: 71, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:26:16,474 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 07:26:16,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2053930.0, ans=0.125 2024-08-13 07:26:21,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2053930.0, ans=0.0 2024-08-13 07:26:35,102 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 07:26:35,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2054030.0, ans=0.125 2024-08-13 07:26:40,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2054030.0, ans=0.0 2024-08-13 07:26:46,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2054130.0, ans=0.025 2024-08-13 07:27:04,165 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 07:27:07,746 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.74 vs. limit=10.0 2024-08-13 07:27:15,678 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 31 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-13 07:27:17,025 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-13 07:27:25,272 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.432e+01 2.694e+01 2.986e+01 7.508e+01, threshold=5.387e+01, percent-clipped=1.0 2024-08-13 07:27:33,204 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2550, loss[loss=0.1053, beats_loss=0.01075, ecapa_loss=0.0001767, whisper_loss=0.09277, over 22632.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01074, ecapa_loss=0.0001657, whisper_loss=0.09197, over 3859970.01 frames. ], batch size: 94, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:28:09,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.62 vs. limit=15.0 2024-08-13 07:28:13,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2054630.0, ans=0.5 2024-08-13 07:28:35,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2054830.0, ans=0.0 2024-08-13 07:28:47,206 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2600, loss[loss=0.1032, beats_loss=0.009875, ecapa_loss=0.0001779, whisper_loss=0.09158, over 22257.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01088, ecapa_loss=0.0001636, whisper_loss=0.09165, over 3869362.72 frames. ], batch size: 90, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:29:05,694 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.62 vs. limit=12.0 2024-08-13 07:29:07,908 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 14 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 07:29:08,410 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2024-08-13 07:29:21,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2055130.0, ans=0.0 2024-08-13 07:29:24,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2055130.0, ans=0.125 2024-08-13 07:29:31,781 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 33 from Vox, 29 fro AS 2024-08-13 07:29:42,352 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 07:29:44,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2055230.0, ans=0.0 2024-08-13 07:29:46,721 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-08-13 07:29:52,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2055330.0, ans=0.125 2024-08-13 07:29:53,333 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 22 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 07:29:56,038 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.419e+01 2.702e+01 3.112e+01 4.104e+01, threshold=5.404e+01, percent-clipped=0.0 2024-08-13 07:30:03,683 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2650, loss[loss=0.1137, beats_loss=0.008529, ecapa_loss=0.000192, whisper_loss=0.1033, over 15533.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01081, ecapa_loss=0.0001655, whisper_loss=0.09173, over 3881531.37 frames. ], batch size: 62, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:30:09,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2055430.0, ans=0.125 2024-08-13 07:30:11,375 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-08-13 07:30:42,426 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 19 from LS+wenet, 30 from Vox, 45 fro AS 2024-08-13 07:30:59,322 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=15.0 2024-08-13 07:31:09,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2055830.0, ans=0.5 2024-08-13 07:31:20,093 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2700, loss[loss=0.101, beats_loss=0.01059, ecapa_loss=0.0001622, whisper_loss=0.08882, over 22286.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01082, ecapa_loss=0.0001665, whisper_loss=0.09065, over 3885593.21 frames. ], batch size: 91, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:31:52,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2056130.0, ans=0.04949747468305833 2024-08-13 07:31:54,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2056130.0, ans=0.125 2024-08-13 07:32:12,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2056230.0, ans=0.125 2024-08-13 07:32:28,788 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.002e+01 2.371e+01 2.713e+01 3.227e+01 1.003e+02, threshold=5.426e+01, percent-clipped=2.0 2024-08-13 07:32:29,122 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 21 from LS+wenet, 27 from Vox, 46 fro AS 2024-08-13 07:32:31,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2056330.0, ans=0.2 2024-08-13 07:32:36,809 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2750, loss[loss=0.09271, beats_loss=0.01324, ecapa_loss=0.0001627, whisper_loss=0.07785, over 18333.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01079, ecapa_loss=0.0001662, whisper_loss=0.09096, over 3876900.64 frames. ], batch size: 74, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:32:37,031 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 20 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-13 07:32:45,583 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2024-08-13 07:33:01,412 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 07:33:08,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=2056630.0, ans=15.0 2024-08-13 07:33:16,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2056630.0, ans=0.0 2024-08-13 07:33:27,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2056730.0, ans=0.125 2024-08-13 07:33:34,893 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-13 07:33:36,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2056730.0, ans=0.1 2024-08-13 07:33:37,337 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2024-08-13 07:33:38,562 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 07:33:43,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2056830.0, ans=0.125 2024-08-13 07:33:47,786 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-13 07:33:55,117 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=22.5 2024-08-13 07:33:55,702 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2800, loss[loss=0.08925, beats_loss=0.01301, ecapa_loss=0.0001408, whisper_loss=0.07484, over 19009.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01083, ecapa_loss=0.0001654, whisper_loss=0.09075, over 3853965.55 frames. ], batch size: 78, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:33:58,678 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 07:34:06,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2056930.0, ans=0.1 2024-08-13 07:34:08,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2056930.0, ans=0.125 2024-08-13 07:34:09,148 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 07:34:20,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2057030.0, ans=0.09899494936611666 2024-08-13 07:34:24,652 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-13 07:34:42,413 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 07:34:57,474 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-13 07:34:57,584 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=15.0 2024-08-13 07:35:03,621 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-13 07:35:07,086 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 07:35:08,226 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.448e+01 2.685e+01 2.951e+01 5.516e+01, threshold=5.370e+01, percent-clipped=1.0 2024-08-13 07:35:10,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2057330.0, ans=0.2 2024-08-13 07:35:11,593 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 07:35:15,873 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2850, loss[loss=0.1008, beats_loss=0.01139, ecapa_loss=0.0001315, whisper_loss=0.08809, over 14300.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01093, ecapa_loss=0.0001639, whisper_loss=0.09039, over 3844388.84 frames. ], batch size: 53, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:35:18,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2057430.0, ans=0.125 2024-08-13 07:35:19,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2057430.0, ans=0.2 2024-08-13 07:35:24,781 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 07:35:58,053 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-13 07:36:12,473 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.89 vs. limit=10.0 2024-08-13 07:36:13,989 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2024-08-13 07:36:35,141 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=16.00 vs. limit=15.0 2024-08-13 07:36:35,680 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 15 from Vox, 51 fro AS 2024-08-13 07:36:37,292 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 07:36:38,276 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2900, loss[loss=0.0945, beats_loss=0.01153, ecapa_loss=0.0001442, whisper_loss=0.08152, over 21717.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0109, ecapa_loss=0.000166, whisper_loss=0.09128, over 3874625.58 frames. ], batch size: 87, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:36:55,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2058030.0, ans=0.0 2024-08-13 07:37:07,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2058030.0, ans=0.0 2024-08-13 07:37:15,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2058130.0, ans=0.125 2024-08-13 07:37:49,949 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.431e+01 2.692e+01 3.123e+01 5.434e+01, threshold=5.383e+01, percent-clipped=1.0 2024-08-13 07:37:58,367 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 2950, loss[loss=0.09719, beats_loss=0.01003, ecapa_loss=0.0001319, whisper_loss=0.08585, over 17292.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01085, ecapa_loss=0.0001652, whisper_loss=0.09185, over 3868449.44 frames. ], batch size: 66, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:38:03,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2058430.0, ans=0.05 2024-08-13 07:38:10,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2058430.0, ans=0.0 2024-08-13 07:38:33,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2058630.0, ans=0.125 2024-08-13 07:38:52,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2058730.0, ans=0.1 2024-08-13 07:38:57,975 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 07:38:59,736 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-13 07:39:03,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2058730.0, ans=0.125 2024-08-13 07:39:23,166 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3000, loss[loss=0.09215, beats_loss=0.01166, ecapa_loss=0.0001909, whisper_loss=0.07859, over 22535.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01081, ecapa_loss=0.0001648, whisper_loss=0.09197, over 3908261.59 frames. ], batch size: 96, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:39:23,167 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-13 07:40:01,156 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.9746e-05, 1.0514e-02, 1.1804e-03, 2.1558e+00, 5.5797e-03, 3.2528e-02, 7.2015e-03, 1.0620e-02], device='cuda:2') 2024-08-13 07:40:01,972 INFO [train_multi_KD3.py:1149] (2/4) Epoch 15, validation on ASR_libri: loss=0.2552, beats_loss=0, ecapa_loss=0.0005768, whisper_loss=0.2494, over 922467.00 frames. 2024-08-13 07:40:18,777 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.5837, 1.3659, 1.7340, 1.6740], device='cuda:2') 2024-08-13 07:40:19,501 INFO [train_multi_KD3.py:1149] (2/4) Epoch 15, validation on SV_voxceleb1: loss=0.00457, beats_loss=0, ecapa_loss=0.000457, whisper_loss=0, over 939242.00 frames. 2024-08-13 07:42:09,775 INFO [train_multi_KD3.py:1149] (2/4) Epoch 15, validation on AT_audioset: loss=0.02377, beats_loss=0.02377, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 07:42:09,778 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-13 07:43:04,874 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 07:43:18,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2059230.0, ans=0.125 2024-08-13 07:43:19,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2059330.0, ans=0.125 2024-08-13 07:43:23,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2059330.0, ans=0.025 2024-08-13 07:43:27,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2059330.0, ans=0.125 2024-08-13 07:43:30,037 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.520e+01 2.888e+01 3.342e+01 5.667e+01, threshold=5.776e+01, percent-clipped=1.0 2024-08-13 07:43:34,094 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 07:43:38,316 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3050, loss[loss=0.1003, beats_loss=0.00947, ecapa_loss=0.0001963, whisper_loss=0.0889, over 22548.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01078, ecapa_loss=0.0001662, whisper_loss=0.09241, over 3928097.03 frames. ], batch size: 92, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:43:56,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2059530.0, ans=0.05 2024-08-13 07:44:00,999 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-13 07:44:11,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2059630.0, ans=0.0 2024-08-13 07:44:26,620 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 07:44:44,085 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 27 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 07:45:03,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2059930.0, ans=0.125 2024-08-13 07:45:04,420 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3100, loss[loss=0.1047, beats_loss=0.01092, ecapa_loss=0.0001792, whisper_loss=0.09198, over 18443.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0108, ecapa_loss=0.000166, whisper_loss=0.09216, over 3923474.31 frames. ], batch size: 73, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:45:09,676 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 07:45:39,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2060130.0, ans=0.125 2024-08-13 07:45:44,681 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-13 07:45:44,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2060130.0, ans=0.2 2024-08-13 07:45:51,030 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 07:46:00,610 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-13 07:46:12,467 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 07:46:19,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2060330.0, ans=0.0 2024-08-13 07:46:22,547 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.355e+01 2.648e+01 2.914e+01 4.175e+01, threshold=5.296e+01, percent-clipped=0.0 2024-08-13 07:46:30,847 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3150, loss[loss=0.109, beats_loss=0.01161, ecapa_loss=0.0001521, whisper_loss=0.09585, over 23043.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01088, ecapa_loss=0.0001651, whisper_loss=0.09155, over 3860232.42 frames. ], batch size: 89, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:46:43,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2060430.0, ans=0.125 2024-08-13 07:46:49,382 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 07:46:50,232 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2024-08-13 07:46:59,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2060530.0, ans=0.125 2024-08-13 07:47:05,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2060630.0, ans=0.1 2024-08-13 07:47:06,227 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 07:47:25,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2060730.0, ans=0.05 2024-08-13 07:47:25,795 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-08-13 07:47:36,564 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=15.0 2024-08-13 07:47:46,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2060830.0, ans=0.2 2024-08-13 07:47:58,472 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-13 07:47:59,456 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3200, loss[loss=0.09884, beats_loss=0.01318, ecapa_loss=0.0001608, whisper_loss=0.08405, over 22406.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01096, ecapa_loss=0.0001644, whisper_loss=0.0911, over 3857724.92 frames. ], batch size: 92, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:49:17,222 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.368e+01 2.637e+01 3.039e+01 1.272e+02, threshold=5.274e+01, percent-clipped=1.0 2024-08-13 07:49:18,725 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.60 vs. limit=12.0 2024-08-13 07:49:25,827 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3250, loss[loss=0.103, beats_loss=0.01128, ecapa_loss=0.0001574, whisper_loss=0.09015, over 21117.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01093, ecapa_loss=0.0001649, whisper_loss=0.09149, over 3846988.06 frames. ], batch size: 83, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:49:32,109 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2024-08-13 07:49:42,107 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 07:49:48,490 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 07:49:48,944 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=15.0 2024-08-13 07:49:59,693 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-13 07:50:03,331 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 32 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 07:50:11,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2061630.0, ans=0.95 2024-08-13 07:50:19,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2061730.0, ans=0.125 2024-08-13 07:50:26,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2061730.0, ans=0.0 2024-08-13 07:50:33,686 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2024-08-13 07:50:39,598 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 07:50:41,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2061830.0, ans=0.0 2024-08-13 07:50:50,042 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3300, loss[loss=0.06177, beats_loss=0.0126, ecapa_loss=0.0002075, whisper_loss=0.04709, over 13358.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01089, ecapa_loss=0.0001649, whisper_loss=0.0919, over 3870912.29 frames. ], batch size: 56, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:50:53,802 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-13 07:50:59,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=2061930.0, ans=0.95 2024-08-13 07:51:02,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2061930.0, ans=0.125 2024-08-13 07:51:34,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=2062130.0, ans=22.5 2024-08-13 07:51:39,098 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2024-08-13 07:51:44,330 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 07:52:00,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2062330.0, ans=0.2 2024-08-13 07:52:02,426 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 07:52:06,975 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 07:52:08,004 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.435e+01 2.813e+01 3.312e+01 6.245e+01, threshold=5.626e+01, percent-clipped=3.0 2024-08-13 07:52:11,421 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-13 07:52:12,722 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 24 from Vox, 17 fro AS 2024-08-13 07:52:15,984 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3350, loss[loss=0.1066, beats_loss=0.01003, ecapa_loss=0.0001659, whisper_loss=0.09487, over 22548.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01083, ecapa_loss=0.0001637, whisper_loss=0.09212, over 3872914.69 frames. ], batch size: 88, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:52:30,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2062430.0, ans=0.1 2024-08-13 07:53:10,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2062730.0, ans=0.0 2024-08-13 07:53:17,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2062730.0, ans=0.0 2024-08-13 07:53:38,160 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3400, loss[loss=0.08037, beats_loss=0.01135, ecapa_loss=0.0001804, whisper_loss=0.06722, over 16524.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01079, ecapa_loss=0.000163, whisper_loss=0.09238, over 3848654.76 frames. ], batch size: 70, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:54:04,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2063030.0, ans=0.1 2024-08-13 07:54:11,218 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 07:54:13,426 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=12.0 2024-08-13 07:54:22,419 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 07:54:30,910 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-13 07:54:31,497 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.71 vs. limit=22.5 2024-08-13 07:54:35,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2063230.0, ans=0.0 2024-08-13 07:54:42,823 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 17 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 07:54:43,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2063330.0, ans=0.125 2024-08-13 07:54:53,476 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.342e+01 2.542e+01 2.769e+01 4.852e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-13 07:55:00,202 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=6.325e-02 2024-08-13 07:55:00,982 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3450, loss[loss=0.1205, beats_loss=0.009995, ecapa_loss=0.000192, whisper_loss=0.1085, over 22903.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01086, ecapa_loss=0.0001632, whisper_loss=0.09219, over 3856339.33 frames. ], batch size: 90, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:55:05,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2063430.0, ans=0.0 2024-08-13 07:55:06,770 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-13 07:55:07,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2063430.0, ans=0.125 2024-08-13 07:55:07,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2063430.0, ans=0.09899494936611666 2024-08-13 07:55:08,073 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-13 07:55:30,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2063530.0, ans=0.1 2024-08-13 07:55:41,546 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 07:55:48,400 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.66 vs. limit=22.5 2024-08-13 07:55:51,255 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 07:55:57,108 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 07:55:58,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2063730.0, ans=0.125 2024-08-13 07:56:03,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2063730.0, ans=0.0 2024-08-13 07:56:09,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2063830.0, ans=0.125 2024-08-13 07:56:20,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2063830.0, ans=0.125 2024-08-13 07:56:22,715 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3500, loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001873, whisper_loss=0.0899, over 21198.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01087, ecapa_loss=0.0001639, whisper_loss=0.09218, over 3860337.16 frames. ], batch size: 91, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:56:30,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2063930.0, ans=0.0 2024-08-13 07:56:37,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2063930.0, ans=0.125 2024-08-13 07:57:06,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2064130.0, ans=0.2 2024-08-13 07:57:09,789 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-13 07:57:22,112 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 07:57:36,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2064330.0, ans=0.0 2024-08-13 07:57:37,535 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.517e+01 2.798e+01 3.148e+01 5.290e+01, threshold=5.596e+01, percent-clipped=1.0 2024-08-13 07:57:42,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2064330.0, ans=0.125 2024-08-13 07:57:47,223 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3550, loss[loss=0.1024, beats_loss=0.01067, ecapa_loss=0.0001801, whisper_loss=0.08996, over 19626.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01092, ecapa_loss=0.000164, whisper_loss=0.09144, over 3879050.99 frames. ], batch size: 78, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:57:52,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2064430.0, ans=0.125 2024-08-13 07:58:01,321 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=15.0 2024-08-13 07:58:04,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2064530.0, ans=0.0 2024-08-13 07:58:08,759 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 23 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-13 07:58:15,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2064530.0, ans=0.125 2024-08-13 07:58:46,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2064730.0, ans=0.0 2024-08-13 07:58:58,287 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 07:59:01,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2064830.0, ans=0.1 2024-08-13 07:59:10,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2064930.0, ans=0.125 2024-08-13 07:59:11,601 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3600, loss[loss=0.08405, beats_loss=0.01392, ecapa_loss=0.0001536, whisper_loss=0.06859, over 19473.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01084, ecapa_loss=0.0001645, whisper_loss=0.09241, over 3873689.98 frames. ], batch size: 81, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:59:31,590 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 19 from LS+wenet, 24 from Vox, 49 fro AS 2024-08-13 07:59:33,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2065030.0, ans=0.125 2024-08-13 07:59:48,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2065130.0, ans=0.125 2024-08-13 07:59:50,516 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-13 07:59:53,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2065130.0, ans=0.2 2024-08-13 08:00:27,254 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.371e+01 2.703e+01 3.040e+01 5.839e+01, threshold=5.406e+01, percent-clipped=1.0 2024-08-13 08:00:28,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2065330.0, ans=0.95 2024-08-13 08:00:36,741 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3650, loss[loss=0.1161, beats_loss=0.00885, ecapa_loss=0.0001674, whisper_loss=0.1056, over 22028.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01088, ecapa_loss=0.0001643, whisper_loss=0.09163, over 3876473.23 frames. ], batch size: 83, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:00:42,425 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 08:00:47,486 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 08:00:52,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2065430.0, ans=0.125 2024-08-13 08:00:53,540 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 08:00:56,018 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=15.0 2024-08-13 08:00:58,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2065530.0, ans=0.125 2024-08-13 08:01:09,205 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 08:01:50,679 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.00 vs. limit=12.0 2024-08-13 08:01:57,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2065830.0, ans=0.0 2024-08-13 08:01:59,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2065830.0, ans=0.125 2024-08-13 08:02:01,028 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3700, loss[loss=0.09353, beats_loss=0.01027, ecapa_loss=0.0001845, whisper_loss=0.08141, over 21071.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01078, ecapa_loss=0.0001655, whisper_loss=0.09234, over 3882631.29 frames. ], batch size: 89, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:02:34,701 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-13 08:02:40,198 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=15.0 2024-08-13 08:02:59,732 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 08:03:12,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2066330.0, ans=0.025 2024-08-13 08:03:13,129 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.360e+01 2.624e+01 2.875e+01 4.532e+01, threshold=5.249e+01, percent-clipped=0.0 2024-08-13 08:03:20,791 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3750, loss[loss=0.1087, beats_loss=0.01012, ecapa_loss=0.0001467, whisper_loss=0.09716, over 23419.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01088, ecapa_loss=0.0001654, whisper_loss=0.09162, over 3878755.01 frames. ], batch size: 91, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:03:21,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2066430.0, ans=0.0 2024-08-13 08:03:30,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2066430.0, ans=0.125 2024-08-13 08:03:31,291 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 08:03:38,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2066530.0, ans=0.125 2024-08-13 08:04:05,750 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-13 08:04:09,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2066730.0, ans=0.125 2024-08-13 08:04:13,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2066730.0, ans=0.07 2024-08-13 08:04:31,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2066830.0, ans=0.0 2024-08-13 08:04:33,119 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 08:04:33,664 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=15.0 2024-08-13 08:04:43,663 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3800, loss[loss=0.1005, beats_loss=0.009181, ecapa_loss=0.0001868, whisper_loss=0.08947, over 17573.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01088, ecapa_loss=0.0001667, whisper_loss=0.09149, over 3900022.86 frames. ], batch size: 70, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:04:47,916 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=15.0 2024-08-13 08:04:57,978 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.21 vs. limit=6.0 2024-08-13 08:05:16,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2067130.0, ans=0.1 2024-08-13 08:05:41,729 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 08:05:45,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2067230.0, ans=0.2 2024-08-13 08:05:45,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2067230.0, ans=0.0 2024-08-13 08:05:51,233 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 08:05:55,980 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.450e+01 2.726e+01 3.001e+01 5.077e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-13 08:05:56,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2067330.0, ans=0.07 2024-08-13 08:05:57,795 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 08:06:02,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2067430.0, ans=0.125 2024-08-13 08:06:03,512 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3850, loss[loss=0.1241, beats_loss=0.009517, ecapa_loss=0.0002408, whisper_loss=0.1122, over 22792.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01087, ecapa_loss=0.0001674, whisper_loss=0.09191, over 3895588.70 frames. ], batch size: 94, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:06:31,569 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-13 08:06:51,369 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.27 vs. limit=10.0 2024-08-13 08:06:57,652 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.72 vs. limit=15.0 2024-08-13 08:07:02,620 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-13 08:07:29,602 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3900, loss[loss=0.104, beats_loss=0.01008, ecapa_loss=0.0001924, whisper_loss=0.09196, over 21467.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0108, ecapa_loss=0.0001675, whisper_loss=0.09222, over 3902488.56 frames. ], batch size: 89, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:07:33,455 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 12 from Vox, 43 fro AS 2024-08-13 08:07:36,516 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.60 vs. limit=22.5 2024-08-13 08:07:37,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2067930.0, ans=0.07 2024-08-13 08:08:43,484 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.513e+01 2.786e+01 3.251e+01 6.128e+01, threshold=5.571e+01, percent-clipped=2.0 2024-08-13 08:08:44,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2068330.0, ans=0.125 2024-08-13 08:08:52,107 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 3950, loss[loss=0.1066, beats_loss=0.01047, ecapa_loss=0.0001306, whisper_loss=0.09477, over 20916.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01073, ecapa_loss=0.0001673, whisper_loss=0.09209, over 3877877.60 frames. ], batch size: 80, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:08:58,784 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.36 vs. limit=12.0 2024-08-13 08:09:12,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2068530.0, ans=10.0 2024-08-13 08:09:25,418 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-13 08:09:26,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2068530.0, ans=0.0 2024-08-13 08:09:44,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=2068630.0, ans=15.0 2024-08-13 08:09:48,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2068730.0, ans=0.0 2024-08-13 08:09:57,325 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 08:09:58,949 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 08:10:17,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2068830.0, ans=0.125 2024-08-13 08:10:20,837 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4000, loss[loss=0.07275, beats_loss=0.01113, ecapa_loss=0.0002052, whisper_loss=0.05957, over 13199.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01074, ecapa_loss=0.0001682, whisper_loss=0.09189, over 3862987.04 frames. ], batch size: 58, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:10:22,828 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 08:10:47,148 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-13 08:11:06,055 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 08:11:06,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2069130.0, ans=0.035 2024-08-13 08:11:22,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2069230.0, ans=0.025 2024-08-13 08:11:30,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2069330.0, ans=0.0 2024-08-13 08:11:34,989 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.601e+01 2.360e+01 2.605e+01 2.925e+01 4.033e+01, threshold=5.210e+01, percent-clipped=0.0 2024-08-13 08:11:43,405 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4050, loss[loss=0.09353, beats_loss=0.01116, ecapa_loss=0.0001814, whisper_loss=0.08056, over 19275.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0107, ecapa_loss=0.0001676, whisper_loss=0.09168, over 3871528.19 frames. ], batch size: 75, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:11:45,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2069430.0, ans=0.0 2024-08-13 08:11:53,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2069430.0, ans=0.1 2024-08-13 08:12:18,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2069630.0, ans=0.125 2024-08-13 08:12:41,259 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-08-13 08:12:45,981 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 08:12:56,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2069830.0, ans=0.125 2024-08-13 08:13:09,749 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4100, loss[loss=0.1063, beats_loss=0.01014, ecapa_loss=0.0001989, whisper_loss=0.09415, over 22626.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01062, ecapa_loss=0.0001689, whisper_loss=0.09237, over 3883239.39 frames. ], batch size: 92, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:13:10,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2069930.0, ans=0.0 2024-08-13 08:13:13,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2069930.0, ans=0.125 2024-08-13 08:13:15,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2069930.0, ans=0.2 2024-08-13 08:13:43,158 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.03 vs. limit=15.0 2024-08-13 08:14:15,751 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 41 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 08:14:24,760 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.381e+01 2.702e+01 3.113e+01 4.589e+01, threshold=5.403e+01, percent-clipped=0.0 2024-08-13 08:14:33,745 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4150, loss[loss=0.1179, beats_loss=0.01166, ecapa_loss=0.0001502, whisper_loss=0.1047, over 23493.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01067, ecapa_loss=0.000167, whisper_loss=0.09221, over 3886685.60 frames. ], batch size: 91, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:14:33,900 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 08:14:55,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2070530.0, ans=0.125 2024-08-13 08:14:59,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2070530.0, ans=0.125 2024-08-13 08:15:00,648 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 38 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 08:15:02,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2070530.0, ans=0.0 2024-08-13 08:15:09,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2070630.0, ans=0.1 2024-08-13 08:15:14,296 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 08:15:37,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2070730.0, ans=0.125 2024-08-13 08:15:53,326 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 22 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-13 08:15:56,121 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4200, loss[loss=0.119, beats_loss=0.01045, ecapa_loss=0.0001922, whisper_loss=0.1066, over 22137.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01068, ecapa_loss=0.0001673, whisper_loss=0.0925, over 3880741.45 frames. ], batch size: 90, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:15:58,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2070930.0, ans=0.0 2024-08-13 08:16:10,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=2070930.0, ans=0.2 2024-08-13 08:16:23,467 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 08:16:25,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2071030.0, ans=0.125 2024-08-13 08:16:36,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2071130.0, ans=0.125 2024-08-13 08:16:40,759 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.11 vs. limit=15.0 2024-08-13 08:16:58,460 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.64 vs. limit=22.5 2024-08-13 08:17:10,992 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.330e+01 2.608e+01 3.052e+01 6.792e+01, threshold=5.217e+01, percent-clipped=3.0 2024-08-13 08:17:18,439 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4250, loss[loss=0.08499, beats_loss=0.01186, ecapa_loss=0.0001603, whisper_loss=0.07152, over 17123.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01073, ecapa_loss=0.0001683, whisper_loss=0.09178, over 3864135.79 frames. ], batch size: 67, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:17:19,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2071430.0, ans=0.1 2024-08-13 08:17:28,022 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 08:17:28,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2071430.0, ans=0.1 2024-08-13 08:17:31,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2071430.0, ans=0.125 2024-08-13 08:17:46,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.43 vs. limit=15.0 2024-08-13 08:17:48,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2071530.0, ans=0.1 2024-08-13 08:17:54,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2071630.0, ans=0.125 2024-08-13 08:17:54,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2071630.0, ans=0.1 2024-08-13 08:18:02,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2071630.0, ans=0.125 2024-08-13 08:18:21,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2071730.0, ans=0.1 2024-08-13 08:18:30,119 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 08:18:33,453 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 08:18:33,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2071830.0, ans=0.125 2024-08-13 08:18:40,854 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4300, loss[loss=0.08993, beats_loss=0.007891, ecapa_loss=0.0001431, whisper_loss=0.0806, over 16059.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01076, ecapa_loss=0.0001688, whisper_loss=0.09145, over 3878313.22 frames. ], batch size: 58, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:18:47,176 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 08:18:47,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2071930.0, ans=0.125 2024-08-13 08:18:47,851 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.55 vs. limit=15.0 2024-08-13 08:18:54,821 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.33 vs. limit=12.0 2024-08-13 08:19:00,849 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.50 vs. limit=6.0 2024-08-13 08:19:10,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2072030.0, ans=0.2 2024-08-13 08:19:20,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2072130.0, ans=0.1 2024-08-13 08:19:37,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2072230.0, ans=0.1 2024-08-13 08:19:39,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2072230.0, ans=0.125 2024-08-13 08:19:53,027 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.447e+01 2.714e+01 2.965e+01 4.296e+01, threshold=5.429e+01, percent-clipped=0.0 2024-08-13 08:20:00,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2072430.0, ans=0.0 2024-08-13 08:20:00,779 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4350, loss[loss=0.1061, beats_loss=0.009247, ecapa_loss=0.000171, whisper_loss=0.09514, over 16429.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01074, ecapa_loss=0.0001693, whisper_loss=0.0916, over 3879811.01 frames. ], batch size: 67, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:20:10,188 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2024-08-13 08:20:20,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2072530.0, ans=0.0 2024-08-13 08:20:21,917 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.58 vs. limit=10.0 2024-08-13 08:20:30,626 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.50 vs. limit=10.0 2024-08-13 08:20:31,720 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-13 08:21:01,113 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 31 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 08:21:03,839 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.62 vs. limit=15.0 2024-08-13 08:21:13,401 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-13 08:21:16,302 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 08:21:23,786 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4400, loss[loss=0.1065, beats_loss=0.01024, ecapa_loss=0.0001758, whisper_loss=0.09447, over 20084.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01081, ecapa_loss=0.0001673, whisper_loss=0.0915, over 3907348.23 frames. ], batch size: 82, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:21:42,464 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 08:21:53,388 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2024-08-13 08:22:24,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2073230.0, ans=0.1 2024-08-13 08:22:29,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2073330.0, ans=0.125 2024-08-13 08:22:35,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2073330.0, ans=0.125 2024-08-13 08:22:39,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2073330.0, ans=0.1 2024-08-13 08:22:40,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.445e+01 2.799e+01 3.126e+01 5.864e+01, threshold=5.599e+01, percent-clipped=1.0 2024-08-13 08:22:43,320 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 14 from Vox, 50 fro AS 2024-08-13 08:22:47,156 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4450, loss[loss=0.1009, beats_loss=0.01078, ecapa_loss=0.0001708, whisper_loss=0.08842, over 18557.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01077, ecapa_loss=0.0001666, whisper_loss=0.09153, over 3888477.52 frames. ], batch size: 74, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:22:51,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2073430.0, ans=0.0 2024-08-13 08:22:56,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2073430.0, ans=0.0 2024-08-13 08:23:21,292 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=12.0 2024-08-13 08:23:55,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2073830.0, ans=0.125 2024-08-13 08:23:59,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2073830.0, ans=0.125 2024-08-13 08:24:06,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2073930.0, ans=0.0 2024-08-13 08:24:07,755 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4500, loss[loss=0.06944, beats_loss=0.01473, ecapa_loss=0.0001252, whisper_loss=0.05345, over 17022.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01085, ecapa_loss=0.0001655, whisper_loss=0.09124, over 3902562.88 frames. ], batch size: 71, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:24:10,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2073930.0, ans=0.05 2024-08-13 08:24:16,224 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 08:24:20,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2073930.0, ans=0.125 2024-08-13 08:24:21,526 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 08:24:23,291 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 11 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 08:24:35,373 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 08:25:00,224 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-13 08:25:02,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2074230.0, ans=0.0 2024-08-13 08:25:15,983 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.370e+01 2.670e+01 3.024e+01 4.135e+01, threshold=5.340e+01, percent-clipped=0.0 2024-08-13 08:25:17,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2074330.0, ans=0.5 2024-08-13 08:25:23,078 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4550, loss[loss=0.1031, beats_loss=0.009333, ecapa_loss=0.0001461, whisper_loss=0.09232, over 20984.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01086, ecapa_loss=0.0001657, whisper_loss=0.09112, over 3906771.10 frames. ], batch size: 81, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:25:30,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2074430.0, ans=0.125 2024-08-13 08:25:40,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2074530.0, ans=0.0 2024-08-13 08:25:41,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2074530.0, ans=0.1 2024-08-13 08:25:50,094 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 29 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-13 08:26:01,092 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 08:26:14,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2074730.0, ans=0.125 2024-08-13 08:26:16,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2074730.0, ans=0.125 2024-08-13 08:26:24,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2074830.0, ans=0.125 2024-08-13 08:26:34,249 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4600, loss[loss=0.1059, beats_loss=0.009858, ecapa_loss=0.0001724, whisper_loss=0.09433, over 19980.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01082, ecapa_loss=0.0001665, whisper_loss=0.09103, over 3900472.70 frames. ], batch size: 81, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:26:42,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2074930.0, ans=0.0 2024-08-13 08:26:45,274 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-13 08:26:53,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2075030.0, ans=10.0 2024-08-13 08:26:55,251 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-13 08:27:01,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2075030.0, ans=0.125 2024-08-13 08:27:03,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2075130.0, ans=0.125 2024-08-13 08:27:07,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2075130.0, ans=0.1 2024-08-13 08:27:36,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2024-08-13 08:27:42,331 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.369e+01 2.617e+01 2.923e+01 4.349e+01, threshold=5.234e+01, percent-clipped=0.0 2024-08-13 08:27:48,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2075430.0, ans=0.1 2024-08-13 08:27:49,120 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4650, loss[loss=0.09871, beats_loss=0.01161, ecapa_loss=0.0001306, whisper_loss=0.08579, over 23774.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01088, ecapa_loss=0.0001665, whisper_loss=0.09061, over 3903393.35 frames. ], batch size: 91, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:27:49,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2075430.0, ans=0.0 2024-08-13 08:27:58,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2075430.0, ans=0.125 2024-08-13 08:28:02,496 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.03 vs. limit=22.5 2024-08-13 08:28:03,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2075530.0, ans=0.1 2024-08-13 08:28:07,996 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-13 08:28:38,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2075730.0, ans=0.2 2024-08-13 08:29:04,536 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4700, loss[loss=0.1013, beats_loss=0.01063, ecapa_loss=0.0001421, whisper_loss=0.08927, over 22212.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01093, ecapa_loss=0.0001663, whisper_loss=0.0905, over 3913509.50 frames. ], batch size: 88, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:29:26,611 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 08:29:32,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2076030.0, ans=0.0 2024-08-13 08:29:45,624 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 08:29:46,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2076130.0, ans=0.2 2024-08-13 08:29:48,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2076130.0, ans=0.2 2024-08-13 08:29:57,140 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 08:29:59,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.16 vs. limit=22.5 2024-08-13 08:30:13,092 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.493e+01 2.764e+01 3.080e+01 1.960e+02, threshold=5.528e+01, percent-clipped=2.0 2024-08-13 08:30:18,014 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.24 vs. limit=22.5 2024-08-13 08:30:20,467 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4750, loss[loss=0.107, beats_loss=0.009959, ecapa_loss=0.000176, whisper_loss=0.0953, over 21981.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.011, ecapa_loss=0.000166, whisper_loss=0.09002, over 3908059.90 frames. ], batch size: 90, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:30:32,431 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 16 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 08:30:37,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2076530.0, ans=0.0 2024-08-13 08:30:42,391 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.02 vs. limit=22.5 2024-08-13 08:30:43,494 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.68 vs. limit=22.5 2024-08-13 08:30:54,299 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-13 08:30:57,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2076630.0, ans=0.0 2024-08-13 08:31:19,942 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.84 vs. limit=22.5 2024-08-13 08:31:21,727 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 41 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 08:31:21,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2076830.0, ans=0.125 2024-08-13 08:31:33,353 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-13 08:31:34,483 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4800, loss[loss=0.1105, beats_loss=0.009066, ecapa_loss=0.0001881, whisper_loss=0.09957, over 14487.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.011, ecapa_loss=0.0001662, whisper_loss=0.0899, over 3894662.84 frames. ], batch size: 57, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:31:43,171 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=12.0 2024-08-13 08:31:50,231 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-13 08:31:51,798 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 18 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 08:32:18,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2077230.0, ans=0.0 2024-08-13 08:32:38,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2077330.0, ans=0.0 2024-08-13 08:32:42,241 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.414e+01 2.705e+01 2.995e+01 6.816e+01, threshold=5.410e+01, percent-clipped=1.0 2024-08-13 08:32:49,599 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4850, loss[loss=0.09811, beats_loss=0.01136, ecapa_loss=0.000176, whisper_loss=0.08498, over 15356.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01095, ecapa_loss=0.0001657, whisper_loss=0.09094, over 3913741.76 frames. ], batch size: 61, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:33:29,584 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 08:33:30,822 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-13 08:33:34,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2077730.0, ans=0.125 2024-08-13 08:33:35,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2077730.0, ans=0.0 2024-08-13 08:33:41,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2077730.0, ans=0.1 2024-08-13 08:33:42,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2077730.0, ans=0.1 2024-08-13 08:33:45,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2077730.0, ans=0.1 2024-08-13 08:34:02,158 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4900, loss[loss=0.1104, beats_loss=0.008876, ecapa_loss=0.0001638, whisper_loss=0.09991, over 16806.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01097, ecapa_loss=0.0001662, whisper_loss=0.09127, over 3930558.51 frames. ], batch size: 65, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:34:21,273 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2024-08-13 08:34:23,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2078030.0, ans=0.1 2024-08-13 08:34:33,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2078130.0, ans=0.2 2024-08-13 08:34:42,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2078130.0, ans=0.1 2024-08-13 08:34:52,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2078230.0, ans=0.1 2024-08-13 08:34:53,474 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-13 08:34:55,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2078230.0, ans=0.125 2024-08-13 08:35:06,074 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.464e+01 2.767e+01 3.041e+01 1.306e+02, threshold=5.534e+01, percent-clipped=2.0 2024-08-13 08:35:06,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2078330.0, ans=0.125 2024-08-13 08:35:12,502 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 4950, loss[loss=0.1253, beats_loss=0.008823, ecapa_loss=0.0002258, whisper_loss=0.1142, over 21599.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01097, ecapa_loss=0.0001665, whisper_loss=0.0905, over 3892973.06 frames. ], batch size: 88, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:35:17,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2078430.0, ans=0.2 2024-08-13 08:35:18,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2078430.0, ans=0.0 2024-08-13 08:35:35,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2078530.0, ans=0.2 2024-08-13 08:35:37,997 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 08:35:48,279 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 08:35:49,698 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-13 08:36:00,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2078730.0, ans=0.125 2024-08-13 08:36:05,058 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 08:36:08,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2078830.0, ans=0.125 2024-08-13 08:36:08,640 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.71 vs. limit=22.5 2024-08-13 08:36:22,753 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5000, loss[loss=0.1254, beats_loss=0.008209, ecapa_loss=0.0001686, whisper_loss=0.1155, over 23585.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01088, ecapa_loss=0.0001668, whisper_loss=0.09128, over 3882383.37 frames. ], batch size: 90, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:36:29,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2078930.0, ans=0.05 2024-08-13 08:36:31,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2078930.0, ans=0.125 2024-08-13 08:36:38,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2079030.0, ans=0.2 2024-08-13 08:36:42,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2079030.0, ans=0.0 2024-08-13 08:36:50,066 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 08:37:01,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2079130.0, ans=0.0 2024-08-13 08:37:02,348 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 08:37:24,107 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.436e+01 2.730e+01 3.076e+01 4.220e+01, threshold=5.460e+01, percent-clipped=0.0 2024-08-13 08:37:24,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2079330.0, ans=0.0 2024-08-13 08:37:30,807 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5050, loss[loss=0.1009, beats_loss=0.01135, ecapa_loss=0.0001522, whisper_loss=0.08799, over 22158.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01099, ecapa_loss=0.000165, whisper_loss=0.09107, over 3882517.20 frames. ], batch size: 89, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:37:44,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2079530.0, ans=0.125 2024-08-13 08:38:12,561 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-13 08:38:30,230 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 08:38:37,788 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5100, loss[loss=0.1217, beats_loss=0.009877, ecapa_loss=0.0001982, whisper_loss=0.1098, over 22445.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01094, ecapa_loss=0.0001663, whisper_loss=0.09156, over 3861000.35 frames. ], batch size: 89, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:38:44,667 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 08:39:02,641 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 14 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-13 08:39:13,424 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 08:39:27,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2080230.0, ans=0.125 2024-08-13 08:39:41,785 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.313e+01 2.678e+01 2.870e+01 5.220e+01, threshold=5.355e+01, percent-clipped=0.0 2024-08-13 08:39:48,472 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5150, loss[loss=0.09303, beats_loss=0.0127, ecapa_loss=0.0001437, whisper_loss=0.0789, over 22700.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01098, ecapa_loss=0.0001644, whisper_loss=0.09117, over 3868795.66 frames. ], batch size: 92, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:39:51,003 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2024-08-13 08:39:52,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2080430.0, ans=0.0 2024-08-13 08:39:55,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2080430.0, ans=0.125 2024-08-13 08:40:06,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2080530.0, ans=0.0 2024-08-13 08:40:17,393 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-13 08:40:42,356 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-13 08:40:57,276 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5200, loss[loss=0.099, beats_loss=0.009376, ecapa_loss=0.0001718, whisper_loss=0.0879, over 14237.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.0001642, whisper_loss=0.09162, over 3837647.87 frames. ], batch size: 57, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:41:08,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2080930.0, ans=0.125 2024-08-13 08:41:09,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2080930.0, ans=0.125 2024-08-13 08:41:21,604 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 24 from Vox, 15 fro AS 2024-08-13 08:41:28,351 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.44 vs. limit=15.0 2024-08-13 08:41:51,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2081230.0, ans=0.0 2024-08-13 08:41:51,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2081230.0, ans=0.125 2024-08-13 08:41:53,602 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 08:41:59,945 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.338e+01 2.575e+01 2.873e+01 5.976e+01, threshold=5.150e+01, percent-clipped=1.0 2024-08-13 08:42:00,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2081330.0, ans=0.0 2024-08-13 08:42:06,568 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5250, loss[loss=0.08566, beats_loss=0.01426, ecapa_loss=0.0001519, whisper_loss=0.06988, over 21597.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01094, ecapa_loss=0.0001625, whisper_loss=0.09135, over 3815629.09 frames. ], batch size: 92, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:42:19,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2081530.0, ans=0.0 2024-08-13 08:42:22,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.32 vs. limit=22.5 2024-08-13 08:42:25,670 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 08:42:27,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2081530.0, ans=0.125 2024-08-13 08:42:37,641 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 08:42:52,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2081730.0, ans=0.0 2024-08-13 08:43:08,682 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=22.5 2024-08-13 08:43:14,626 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5300, loss[loss=0.08053, beats_loss=0.01228, ecapa_loss=0.0001801, whisper_loss=0.06645, over 16311.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01089, ecapa_loss=0.0001637, whisper_loss=0.09152, over 3847677.88 frames. ], batch size: 68, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:43:20,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2081930.0, ans=0.125 2024-08-13 08:43:24,573 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 08:43:29,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2082030.0, ans=0.025 2024-08-13 08:43:30,059 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 17 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-13 08:43:34,283 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 08:43:43,894 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-13 08:43:56,977 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2024-08-13 08:44:08,850 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-13 08:44:17,209 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.493e+01 2.716e+01 3.005e+01 4.281e+01, threshold=5.431e+01, percent-clipped=0.0 2024-08-13 08:44:24,058 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5350, loss[loss=0.1263, beats_loss=0.007151, ecapa_loss=0.0001882, whisper_loss=0.1173, over 23680.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01089, ecapa_loss=0.0001635, whisper_loss=0.09172, over 3847419.76 frames. ], batch size: 93, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:44:25,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2082430.0, ans=0.125 2024-08-13 08:44:40,271 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-13 08:44:45,541 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 08:44:45,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2082530.0, ans=0.1 2024-08-13 08:44:55,035 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.23 vs. limit=10.0 2024-08-13 08:44:55,625 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-13 08:44:59,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2082630.0, ans=0.2 2024-08-13 08:45:22,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2082830.0, ans=0.125 2024-08-13 08:45:23,156 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 08:45:32,486 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5400, loss[loss=0.1173, beats_loss=0.008125, ecapa_loss=0.0001625, whisper_loss=0.1075, over 14667.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01085, ecapa_loss=0.0001633, whisper_loss=0.09186, over 3853428.57 frames. ], batch size: 54, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:45:53,284 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.41 vs. limit=22.5 2024-08-13 08:45:54,706 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2024-08-13 08:46:02,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2083130.0, ans=0.2 2024-08-13 08:46:28,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2083330.0, ans=0.0 2024-08-13 08:46:34,064 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.479e+01 2.684e+01 3.109e+01 1.549e+02, threshold=5.369e+01, percent-clipped=2.0 2024-08-13 08:46:34,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=2083330.0, ans=15.0 2024-08-13 08:46:40,895 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5450, loss[loss=0.08192, beats_loss=0.01225, ecapa_loss=0.0001699, whisper_loss=0.06797, over 20767.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01084, ecapa_loss=0.0001636, whisper_loss=0.09147, over 3830098.45 frames. ], batch size: 86, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:46:50,918 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 36 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 08:46:59,243 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 31 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 08:47:04,562 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 08:47:08,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2083530.0, ans=0.125 2024-08-13 08:47:12,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2083630.0, ans=0.2 2024-08-13 08:47:27,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2083730.0, ans=0.07 2024-08-13 08:47:31,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2083730.0, ans=0.1 2024-08-13 08:47:31,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2083730.0, ans=0.0 2024-08-13 08:47:32,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2083730.0, ans=0.125 2024-08-13 08:47:59,006 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5500, loss[loss=0.1051, beats_loss=0.01099, ecapa_loss=0.0001204, whisper_loss=0.09288, over 22033.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01086, ecapa_loss=0.0001626, whisper_loss=0.09166, over 3840202.11 frames. ], batch size: 82, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:47:59,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2083930.0, ans=0.125 2024-08-13 08:48:26,259 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 08:48:43,572 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.91 vs. limit=22.5 2024-08-13 08:48:56,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2084230.0, ans=0.125 2024-08-13 08:49:09,371 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2024-08-13 08:49:16,873 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.483e+01 2.752e+01 3.066e+01 5.816e+01, threshold=5.504e+01, percent-clipped=1.0 2024-08-13 08:49:26,429 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5550, loss[loss=0.08729, beats_loss=0.01147, ecapa_loss=0.0001361, whisper_loss=0.07446, over 16546.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01087, ecapa_loss=0.0001636, whisper_loss=0.0912, over 3863207.17 frames. ], batch size: 67, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:49:26,995 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.032e+01 2024-08-13 08:49:32,210 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 08:50:01,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2084530.0, ans=0.1 2024-08-13 08:50:04,940 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-13 08:50:24,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2084730.0, ans=0.125 2024-08-13 08:50:26,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2084730.0, ans=0.2 2024-08-13 08:50:26,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2084730.0, ans=0.125 2024-08-13 08:50:49,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2084830.0, ans=0.07 2024-08-13 08:50:59,721 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5600, loss[loss=0.1005, beats_loss=0.009206, ecapa_loss=0.0001916, whisper_loss=0.08939, over 19542.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01088, ecapa_loss=0.0001648, whisper_loss=0.09146, over 3867574.10 frames. ], batch size: 81, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:51:19,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2085030.0, ans=0.2 2024-08-13 08:51:21,687 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-08-13 08:51:23,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2085030.0, ans=0.125 2024-08-13 08:51:45,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2085130.0, ans=0.0 2024-08-13 08:52:04,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2085230.0, ans=0.125 2024-08-13 08:52:30,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2085330.0, ans=0.1 2024-08-13 08:52:33,292 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+01 2.389e+01 2.717e+01 3.076e+01 5.909e+01, threshold=5.434e+01, percent-clipped=1.0 2024-08-13 08:52:43,383 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5650, loss[loss=0.118, beats_loss=0.01317, ecapa_loss=0.0001257, whisper_loss=0.1036, over 22809.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01088, ecapa_loss=0.0001655, whisper_loss=0.09094, over 3867385.17 frames. ], batch size: 87, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:52:49,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2085430.0, ans=0.05 2024-08-13 08:52:51,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2085430.0, ans=0.125 2024-08-13 08:53:12,997 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 13 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 08:53:34,798 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-08-13 08:53:44,666 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 08:53:45,252 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.13 vs. limit=12.0 2024-08-13 08:54:01,258 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 08:54:05,918 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=15.0 2024-08-13 08:54:17,629 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5700, loss[loss=0.09308, beats_loss=0.01146, ecapa_loss=0.0001379, whisper_loss=0.08024, over 22514.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0109, ecapa_loss=0.0001656, whisper_loss=0.09084, over 3876396.58 frames. ], batch size: 87, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:54:24,675 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-13 08:54:37,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2086030.0, ans=0.2 2024-08-13 08:54:48,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2086130.0, ans=0.125 2024-08-13 08:55:13,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2086230.0, ans=0.0 2024-08-13 08:55:21,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2086330.0, ans=0.0 2024-08-13 08:55:25,043 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.405e+01 2.655e+01 3.007e+01 4.478e+01, threshold=5.310e+01, percent-clipped=0.0 2024-08-13 08:55:26,967 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 08:55:32,369 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 08:55:33,829 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5750, loss[loss=0.1099, beats_loss=0.01091, ecapa_loss=0.0001576, whisper_loss=0.09737, over 20869.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01088, ecapa_loss=0.0001662, whisper_loss=0.0911, over 3892997.74 frames. ], batch size: 85, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:55:37,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2086430.0, ans=0.125 2024-08-13 08:56:02,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2086530.0, ans=0.2 2024-08-13 08:56:36,910 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2024-08-13 08:56:39,820 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-13 08:56:41,184 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 08:56:51,834 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5800, loss[loss=0.09031, beats_loss=0.01198, ecapa_loss=0.0001528, whisper_loss=0.07681, over 16430.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01083, ecapa_loss=0.0001665, whisper_loss=0.09103, over 3869076.57 frames. ], batch size: 66, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:57:01,316 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 24 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-13 08:57:18,568 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.66 vs. limit=22.5 2024-08-13 08:57:27,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2087130.0, ans=0.125 2024-08-13 08:57:28,954 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 08:57:37,179 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 08:57:52,962 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 18 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-13 08:57:55,908 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-13 08:57:57,785 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.58 vs. limit=12.0 2024-08-13 08:58:02,115 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.414e+01 2.686e+01 3.038e+01 9.495e+01, threshold=5.372e+01, percent-clipped=3.0 2024-08-13 08:58:09,838 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5850, loss[loss=0.0994, beats_loss=0.01041, ecapa_loss=0.0001927, whisper_loss=0.08706, over 20955.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01089, ecapa_loss=0.0001671, whisper_loss=0.09094, over 3877495.09 frames. ], batch size: 87, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:58:42,844 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 08:58:58,454 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.39 vs. limit=10.0 2024-08-13 08:58:59,871 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.30 vs. limit=22.5 2024-08-13 08:59:05,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2087730.0, ans=0.0 2024-08-13 08:59:13,053 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 08:59:18,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2087830.0, ans=0.1 2024-08-13 08:59:25,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2087830.0, ans=0.1 2024-08-13 08:59:28,791 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5900, loss[loss=0.1285, beats_loss=0.008429, ecapa_loss=0.0001991, whisper_loss=0.1181, over 21054.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01096, ecapa_loss=0.0001658, whisper_loss=0.09093, over 3892604.73 frames. ], batch size: 86, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:59:28,989 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 08:59:29,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2087930.0, ans=0.0 2024-08-13 08:59:33,556 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 08:59:35,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=2087930.0, ans=0.1 2024-08-13 08:59:56,740 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2024-08-13 09:00:03,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2088130.0, ans=0.125 2024-08-13 09:00:04,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2088130.0, ans=0.0 2024-08-13 09:00:10,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2088130.0, ans=0.125 2024-08-13 09:00:10,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2088130.0, ans=0.0 2024-08-13 09:00:21,725 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.33 vs. limit=15.0 2024-08-13 09:00:26,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2088230.0, ans=0.0 2024-08-13 09:00:36,599 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 09:00:39,552 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.419e+01 2.634e+01 3.004e+01 5.084e+01, threshold=5.268e+01, percent-clipped=0.0 2024-08-13 09:00:47,064 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 5950, loss[loss=0.08715, beats_loss=0.01296, ecapa_loss=0.0001859, whisper_loss=0.07233, over 21808.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01099, ecapa_loss=0.0001665, whisper_loss=0.09086, over 3876397.77 frames. ], batch size: 91, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:00:55,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2088430.0, ans=0.0 2024-08-13 09:00:58,397 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 09:01:07,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2088530.0, ans=0.0 2024-08-13 09:01:09,977 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 09:01:35,870 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2024-08-13 09:01:41,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2088730.0, ans=0.2 2024-08-13 09:01:43,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2088730.0, ans=0.125 2024-08-13 09:01:48,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2088730.0, ans=0.07 2024-08-13 09:02:06,999 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6000, loss[loss=0.1115, beats_loss=0.009301, ecapa_loss=0.0002272, whisper_loss=0.09992, over 20520.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01101, ecapa_loss=0.0001665, whisper_loss=0.09064, over 3931434.02 frames. ], batch size: 89, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:02:07,000 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-13 09:02:46,544 INFO [train_multi_KD3.py:1149] (2/4) Epoch 15, validation on ASR_libri: loss=0.2545, beats_loss=0, ecapa_loss=0.0005583, whisper_loss=0.2489, over 922467.00 frames. 2024-08-13 09:03:04,005 INFO [train_multi_KD3.py:1149] (2/4) Epoch 15, validation on SV_voxceleb1: loss=0.004508, beats_loss=0, ecapa_loss=0.0004508, whisper_loss=0, over 939242.00 frames. 2024-08-13 09:05:03,060 INFO [train_multi_KD3.py:1149] (2/4) Epoch 15, validation on AT_audioset: loss=0.02381, beats_loss=0.02381, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 09:05:03,064 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-13 09:05:10,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2088930.0, ans=0.125 2024-08-13 09:05:12,705 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.70 vs. limit=22.5 2024-08-13 09:05:16,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2089030.0, ans=0.125 2024-08-13 09:05:21,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2089030.0, ans=0.125 2024-08-13 09:05:22,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2089030.0, ans=0.1 2024-08-13 09:05:34,602 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 09:05:42,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2089130.0, ans=0.07 2024-08-13 09:05:55,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2089230.0, ans=0.0 2024-08-13 09:06:03,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2089330.0, ans=0.125 2024-08-13 09:06:12,154 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.429e+01 2.733e+01 3.006e+01 6.424e+01, threshold=5.466e+01, percent-clipped=1.0 2024-08-13 09:06:19,851 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6050, loss[loss=0.09798, beats_loss=0.01325, ecapa_loss=0.0001444, whisper_loss=0.08328, over 22607.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01098, ecapa_loss=0.0001657, whisper_loss=0.09076, over 3932491.25 frames. ], batch size: 91, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:06:47,190 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2024-08-13 09:07:17,738 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 09:07:18,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2089730.0, ans=0.125 2024-08-13 09:07:29,611 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=12.0 2024-08-13 09:07:38,491 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 09:07:41,889 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6100, loss[loss=0.1016, beats_loss=0.0119, ecapa_loss=0.0001316, whisper_loss=0.08839, over 20676.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01091, ecapa_loss=0.000166, whisper_loss=0.0908, over 3897621.80 frames. ], batch size: 81, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:07:48,683 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 09:07:51,545 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-13 09:08:01,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2090030.0, ans=0.0 2024-08-13 09:08:01,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2090030.0, ans=0.1 2024-08-13 09:08:19,951 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 09:08:22,204 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 09:08:27,084 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-13 09:08:30,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2090230.0, ans=0.015 2024-08-13 09:08:34,490 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.12 vs. limit=15.0 2024-08-13 09:08:55,609 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.306e+01 2.537e+01 2.839e+01 1.271e+02, threshold=5.074e+01, percent-clipped=1.0 2024-08-13 09:09:03,087 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6150, loss[loss=0.09167, beats_loss=0.01257, ecapa_loss=0.0001489, whisper_loss=0.07761, over 21245.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01109, ecapa_loss=0.0001653, whisper_loss=0.08973, over 3911759.51 frames. ], batch size: 87, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:09:13,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2090430.0, ans=0.0 2024-08-13 09:09:26,246 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.91 vs. limit=15.0 2024-08-13 09:09:31,639 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 32 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 09:09:41,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2090630.0, ans=0.125 2024-08-13 09:09:44,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.74 vs. limit=15.0 2024-08-13 09:09:49,506 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.94 vs. limit=15.0 2024-08-13 09:10:06,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2090830.0, ans=0.0 2024-08-13 09:10:11,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2090830.0, ans=0.125 2024-08-13 09:10:20,462 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 09:10:20,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2090830.0, ans=0.125 2024-08-13 09:10:23,374 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6200, loss[loss=0.1104, beats_loss=0.008545, ecapa_loss=0.0001774, whisper_loss=0.1001, over 21763.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01096, ecapa_loss=0.0001653, whisper_loss=0.09032, over 3897000.81 frames. ], batch size: 87, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:10:45,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2091030.0, ans=0.0 2024-08-13 09:10:54,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2091030.0, ans=0.0 2024-08-13 09:11:10,533 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.42 vs. limit=10.0 2024-08-13 09:11:17,650 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 09:11:18,902 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 09:11:34,092 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 09:11:37,971 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.446e+01 2.761e+01 3.049e+01 5.001e+01, threshold=5.523e+01, percent-clipped=0.0 2024-08-13 09:11:45,298 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6250, loss[loss=0.08469, beats_loss=0.01212, ecapa_loss=0.0001189, whisper_loss=0.07138, over 15519.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01095, ecapa_loss=0.0001645, whisper_loss=0.0899, over 3877390.66 frames. ], batch size: 58, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:11:49,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2091430.0, ans=0.0 2024-08-13 09:11:54,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2091430.0, ans=0.125 2024-08-13 09:12:04,033 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 19 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 09:12:06,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2091530.0, ans=0.2 2024-08-13 09:12:07,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2091530.0, ans=0.2 2024-08-13 09:12:12,987 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=12.0 2024-08-13 09:12:15,373 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-13 09:12:30,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2091630.0, ans=0.125 2024-08-13 09:12:30,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2091630.0, ans=0.2 2024-08-13 09:12:39,498 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2024-08-13 09:13:03,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2091830.0, ans=0.125 2024-08-13 09:13:05,960 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6300, loss[loss=0.1042, beats_loss=0.01344, ecapa_loss=0.0001208, whisper_loss=0.08953, over 21453.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01096, ecapa_loss=0.0001645, whisper_loss=0.08937, over 3863922.41 frames. ], batch size: 84, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:13:22,864 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 09:13:34,498 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.70 vs. limit=15.0 2024-08-13 09:14:10,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2092330.0, ans=0.125 2024-08-13 09:14:12,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2092330.0, ans=0.07 2024-08-13 09:14:16,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.468e+01 2.785e+01 3.208e+01 1.167e+02, threshold=5.571e+01, percent-clipped=1.0 2024-08-13 09:14:24,602 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6350, loss[loss=0.09182, beats_loss=0.01301, ecapa_loss=0.0001193, whisper_loss=0.07762, over 21614.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01096, ecapa_loss=0.0001637, whisper_loss=0.08959, over 3868093.08 frames. ], batch size: 87, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:14:30,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2092430.0, ans=0.04949747468305833 2024-08-13 09:14:52,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2092530.0, ans=0.1 2024-08-13 09:14:52,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2092530.0, ans=0.125 2024-08-13 09:14:54,282 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 26 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-13 09:14:55,835 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 09:15:00,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2092630.0, ans=0.09899494936611666 2024-08-13 09:15:09,235 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 09:15:11,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2092730.0, ans=0.0 2024-08-13 09:15:12,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2092730.0, ans=0.125 2024-08-13 09:15:23,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2092830.0, ans=0.0 2024-08-13 09:15:32,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2092830.0, ans=0.1 2024-08-13 09:15:34,404 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 09:15:35,530 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6400, loss[loss=0.1126, beats_loss=0.009838, ecapa_loss=0.0001495, whisper_loss=0.1013, over 15904.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0109, ecapa_loss=0.0001632, whisper_loss=0.09063, over 3880975.19 frames. ], batch size: 59, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:15:38,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2092930.0, ans=0.125 2024-08-13 09:15:40,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2092930.0, ans=0.125 2024-08-13 09:15:55,101 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 09:16:04,621 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.26 vs. limit=15.0 2024-08-13 09:16:22,509 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 09:16:34,589 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.483e+01 2.753e+01 3.245e+01 5.103e+01, threshold=5.505e+01, percent-clipped=0.0 2024-08-13 09:16:38,905 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 09:16:41,191 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6450, loss[loss=0.08809, beats_loss=0.01397, ecapa_loss=0.0001557, whisper_loss=0.07257, over 20169.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01096, ecapa_loss=0.000164, whisper_loss=0.0901, over 3882255.78 frames. ], batch size: 82, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:16:50,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2093430.0, ans=0.2 2024-08-13 09:17:02,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2093530.0, ans=0.0 2024-08-13 09:17:06,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2093630.0, ans=0.125 2024-08-13 09:17:31,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2093730.0, ans=0.125 2024-08-13 09:17:34,287 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-13 09:17:36,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.33 vs. limit=10.0 2024-08-13 09:17:46,590 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6500, loss[loss=0.1237, beats_loss=0.01004, ecapa_loss=0.0001645, whisper_loss=0.1121, over 22905.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01079, ecapa_loss=0.0001663, whisper_loss=0.09176, over 3892653.58 frames. ], batch size: 90, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:17:48,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2093930.0, ans=0.125 2024-08-13 09:17:51,292 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.759e+00 2024-08-13 09:17:55,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2093930.0, ans=0.125 2024-08-13 09:18:05,207 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 09:18:08,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2094030.0, ans=0.0 2024-08-13 09:18:25,324 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 18 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-13 09:18:36,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2094230.0, ans=0.0 2024-08-13 09:18:37,554 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-13 09:18:41,692 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.26 vs. limit=22.5 2024-08-13 09:18:46,271 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.544e+01 2.898e+01 3.309e+01 5.602e+01, threshold=5.795e+01, percent-clipped=1.0 2024-08-13 09:18:52,771 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6550, loss[loss=0.09176, beats_loss=0.01179, ecapa_loss=0.0001776, whisper_loss=0.07819, over 21615.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01078, ecapa_loss=0.0001653, whisper_loss=0.09174, over 3903653.24 frames. ], batch size: 91, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:18:57,500 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2024-08-13 09:19:10,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2094530.0, ans=0.125 2024-08-13 09:19:25,778 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 09:19:40,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2094730.0, ans=0.0 2024-08-13 09:19:42,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2094730.0, ans=0.05 2024-08-13 09:19:51,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2094830.0, ans=0.125 2024-08-13 09:19:54,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2094830.0, ans=0.1 2024-08-13 09:19:55,111 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 09:19:57,589 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6600, loss[loss=0.0886, beats_loss=0.0124, ecapa_loss=0.0001764, whisper_loss=0.07444, over 21661.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01076, ecapa_loss=0.000167, whisper_loss=0.09232, over 3956857.01 frames. ], batch size: 93, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:20:01,834 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 09:20:03,647 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-08-13 09:20:35,500 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.76 vs. limit=6.0 2024-08-13 09:20:42,954 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 09:20:49,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2095330.0, ans=0.1 2024-08-13 09:20:49,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.61 vs. limit=10.0 2024-08-13 09:20:56,668 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.422e+01 2.623e+01 3.004e+01 7.541e+01, threshold=5.247e+01, percent-clipped=2.0 2024-08-13 09:21:03,140 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=22.5 2024-08-13 09:21:03,436 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6650, loss[loss=0.09294, beats_loss=0.009857, ecapa_loss=0.000161, whisper_loss=0.08147, over 17730.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01081, ecapa_loss=0.0001677, whisper_loss=0.09229, over 3955540.00 frames. ], batch size: 68, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:21:11,524 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-13 09:21:39,056 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 09:21:47,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2095730.0, ans=0.04949747468305833 2024-08-13 09:21:56,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2095830.0, ans=0.1 2024-08-13 09:21:57,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2095830.0, ans=0.1 2024-08-13 09:22:06,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2095830.0, ans=0.0 2024-08-13 09:22:09,186 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6700, loss[loss=0.1141, beats_loss=0.006474, ecapa_loss=0.0001985, whisper_loss=0.1057, over 16222.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01072, ecapa_loss=0.0001674, whisper_loss=0.09275, over 3945861.68 frames. ], batch size: 60, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:22:13,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2095930.0, ans=0.0 2024-08-13 09:22:25,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2096030.0, ans=0.0 2024-08-13 09:22:38,183 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.46 vs. limit=12.0 2024-08-13 09:22:46,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2096130.0, ans=0.0 2024-08-13 09:22:58,987 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 09:23:08,000 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.450e+01 2.665e+01 3.008e+01 5.668e+01, threshold=5.331e+01, percent-clipped=2.0 2024-08-13 09:23:14,789 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6750, loss[loss=0.105, beats_loss=0.01198, ecapa_loss=0.000161, whisper_loss=0.09142, over 20913.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01072, ecapa_loss=0.000168, whisper_loss=0.09256, over 3923346.21 frames. ], batch size: 88, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:23:31,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2096530.0, ans=10.0 2024-08-13 09:23:37,002 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2024-08-13 09:23:38,865 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 09:23:39,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2096530.0, ans=0.125 2024-08-13 09:23:59,209 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2024-08-13 09:24:11,567 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 09:24:19,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2096930.0, ans=0.0 2024-08-13 09:24:20,352 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6800, loss[loss=0.1038, beats_loss=0.0094, ecapa_loss=0.000138, whisper_loss=0.09298, over 17657.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01072, ecapa_loss=0.0001673, whisper_loss=0.09141, over 3892678.76 frames. ], batch size: 67, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:25:19,892 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-13 09:25:20,881 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.429e+01 2.619e+01 3.014e+01 5.255e+01, threshold=5.237e+01, percent-clipped=0.0 2024-08-13 09:25:27,358 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.77 vs. limit=6.0 2024-08-13 09:25:27,744 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6850, loss[loss=0.1108, beats_loss=0.01082, ecapa_loss=0.0001653, whisper_loss=0.09835, over 23064.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01073, ecapa_loss=0.0001663, whisper_loss=0.09079, over 3835363.75 frames. ], batch size: 93, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:25:30,864 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.54 vs. limit=22.5 2024-08-13 09:25:34,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2097430.0, ans=0.125 2024-08-13 09:25:44,882 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 09:25:53,937 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 20 from LS+wenet, 33 from Vox, 42 fro AS 2024-08-13 09:25:58,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2097630.0, ans=0.125 2024-08-13 09:26:03,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2097630.0, ans=0.2 2024-08-13 09:26:03,597 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.07 vs. limit=12.0 2024-08-13 09:26:09,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2097730.0, ans=0.125 2024-08-13 09:26:17,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2097730.0, ans=0.2 2024-08-13 09:26:33,024 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6900, loss[loss=0.1174, beats_loss=0.01064, ecapa_loss=0.0001841, whisper_loss=0.1049, over 19503.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01086, ecapa_loss=0.0001661, whisper_loss=0.09055, over 3866076.56 frames. ], batch size: 78, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:26:36,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2097930.0, ans=10.0 2024-08-13 09:26:39,183 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 09:26:40,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2097930.0, ans=0.125 2024-08-13 09:26:42,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2097930.0, ans=0.0 2024-08-13 09:26:42,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2097930.0, ans=0.0 2024-08-13 09:26:54,112 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.58 vs. limit=10.0 2024-08-13 09:26:56,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2098030.0, ans=0.125 2024-08-13 09:27:01,869 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.88 vs. limit=10.0 2024-08-13 09:27:02,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2098130.0, ans=0.125 2024-08-13 09:27:04,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2098130.0, ans=0.125 2024-08-13 09:27:04,464 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.08 vs. limit=15.0 2024-08-13 09:27:05,838 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=15.0 2024-08-13 09:27:05,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2098130.0, ans=15.0 2024-08-13 09:27:06,935 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 09:27:08,246 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-13 09:27:32,672 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.455e+01 2.903e+01 3.270e+01 5.847e+01, threshold=5.807e+01, percent-clipped=1.0 2024-08-13 09:27:39,174 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 6950, loss[loss=0.1185, beats_loss=0.008529, ecapa_loss=0.0001809, whisper_loss=0.1082, over 19755.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0109, ecapa_loss=0.0001653, whisper_loss=0.09064, over 3830130.07 frames. ], batch size: 75, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:27:49,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2098430.0, ans=0.0 2024-08-13 09:27:50,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2098430.0, ans=0.0 2024-08-13 09:27:51,219 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 09:27:57,895 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 09:28:30,314 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 09:28:44,181 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.37 vs. limit=5.0 2024-08-13 09:28:44,327 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7000, loss[loss=0.1087, beats_loss=0.008291, ecapa_loss=0.0002072, whisper_loss=0.0983, over 15289.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01094, ecapa_loss=0.000166, whisper_loss=0.09024, over 3841411.89 frames. ], batch size: 60, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:28:51,204 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 09:29:02,275 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.299e-02 2024-08-13 09:29:11,978 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-13 09:29:29,833 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-13 09:29:38,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2099330.0, ans=0.1 2024-08-13 09:29:42,653 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.399e+01 2.678e+01 3.214e+01 5.831e+01, threshold=5.356e+01, percent-clipped=1.0 2024-08-13 09:29:43,524 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.34 vs. limit=12.0 2024-08-13 09:29:48,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2099430.0, ans=6.0 2024-08-13 09:29:49,623 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7050, loss[loss=0.07428, beats_loss=0.01244, ecapa_loss=0.000135, whisper_loss=0.06049, over 15790.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01096, ecapa_loss=0.0001656, whisper_loss=0.09007, over 3837727.31 frames. ], batch size: 61, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:30:01,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2099430.0, ans=0.125 2024-08-13 09:30:16,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2099530.0, ans=0.0 2024-08-13 09:30:19,954 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 09:30:20,620 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.68 vs. limit=22.5 2024-08-13 09:30:21,844 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-08-13 09:30:26,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2099630.0, ans=0.0 2024-08-13 09:30:46,795 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.531e+05 2024-08-13 09:30:53,881 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-13 09:31:00,368 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7100, loss[loss=0.08803, beats_loss=0.01289, ecapa_loss=0.0001418, whisper_loss=0.07372, over 22568.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01098, ecapa_loss=0.0001645, whisper_loss=0.09023, over 3862865.39 frames. ], batch size: 92, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:31:39,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2100130.0, ans=0.0 2024-08-13 09:31:42,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2100130.0, ans=0.05 2024-08-13 09:31:42,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2100130.0, ans=0.125 2024-08-13 09:32:03,045 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 09:32:08,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2100330.0, ans=0.1 2024-08-13 09:32:08,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.488e+01 2.756e+01 3.074e+01 1.860e+02, threshold=5.512e+01, percent-clipped=2.0 2024-08-13 09:32:14,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2100430.0, ans=0.125 2024-08-13 09:32:14,904 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7150, loss[loss=0.14, beats_loss=0.00724, ecapa_loss=0.0001557, whisper_loss=0.1312, over 22083.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01089, ecapa_loss=0.0001639, whisper_loss=0.0913, over 3882587.02 frames. ], batch size: 77, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:33:02,812 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 09:33:02,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2100730.0, ans=0.0 2024-08-13 09:33:16,486 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 09:33:29,680 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7200, loss[loss=0.08591, beats_loss=0.01061, ecapa_loss=0.0001859, whisper_loss=0.07344, over 13178.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01092, ecapa_loss=0.0001645, whisper_loss=0.09112, over 3883085.56 frames. ], batch size: 54, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:33:33,338 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.165e+01 2024-08-13 09:33:38,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2100930.0, ans=0.125 2024-08-13 09:33:58,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2101130.0, ans=0.2 2024-08-13 09:34:16,510 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 09:34:18,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2101230.0, ans=0.0 2024-08-13 09:34:38,281 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.408e+01 2.663e+01 2.960e+01 8.950e+01, threshold=5.327e+01, percent-clipped=1.0 2024-08-13 09:34:44,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7250, loss[loss=0.09454, beats_loss=0.01187, ecapa_loss=0.0001358, whisper_loss=0.08131, over 23586.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01091, ecapa_loss=0.0001643, whisper_loss=0.09129, over 3891448.29 frames. ], batch size: 93, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:34:48,266 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.11 vs. limit=12.0 2024-08-13 09:34:51,399 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 17 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-13 09:34:52,648 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 09:34:55,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2101430.0, ans=0.125 2024-08-13 09:35:05,365 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 09:35:17,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=2101630.0, ans=0.2 2024-08-13 09:35:34,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2101730.0, ans=0.125 2024-08-13 09:35:39,449 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2024-08-13 09:35:46,371 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-13 09:35:47,699 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 12 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 09:35:48,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2101830.0, ans=0.2 2024-08-13 09:35:49,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2101830.0, ans=0.125 2024-08-13 09:35:53,659 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 41 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 09:35:59,579 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7300, loss[loss=0.09884, beats_loss=0.01059, ecapa_loss=0.0001632, whisper_loss=0.08662, over 18791.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01084, ecapa_loss=0.0001643, whisper_loss=0.09199, over 3888665.69 frames. ], batch size: 77, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:36:06,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2101930.0, ans=0.125 2024-08-13 09:36:09,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2101930.0, ans=0.1 2024-08-13 09:36:11,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2101930.0, ans=0.0 2024-08-13 09:36:15,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=2102030.0, ans=0.05 2024-08-13 09:36:18,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2102030.0, ans=0.0 2024-08-13 09:36:34,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2102130.0, ans=0.125 2024-08-13 09:36:46,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2102230.0, ans=0.0 2024-08-13 09:36:51,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2102230.0, ans=0.0 2024-08-13 09:36:53,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2102230.0, ans=0.125 2024-08-13 09:37:06,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2102330.0, ans=0.125 2024-08-13 09:37:07,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2102330.0, ans=0.04949747468305833 2024-08-13 09:37:08,583 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.467e+01 2.644e+01 2.965e+01 8.104e+01, threshold=5.287e+01, percent-clipped=3.0 2024-08-13 09:37:10,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2102330.0, ans=0.0 2024-08-13 09:37:14,139 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7350, loss[loss=0.08885, beats_loss=0.01273, ecapa_loss=0.0001192, whisper_loss=0.07492, over 20379.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01083, ecapa_loss=0.0001656, whisper_loss=0.09242, over 3914229.96 frames. ], batch size: 80, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:37:17,642 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-13 09:38:17,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2102830.0, ans=0.1 2024-08-13 09:38:18,621 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 28 from Vox, 21 fro AS 2024-08-13 09:38:22,678 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 09:38:22,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2102830.0, ans=0.0 2024-08-13 09:38:28,996 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7400, loss[loss=0.09524, beats_loss=0.01182, ecapa_loss=0.0001795, whisper_loss=0.08163, over 18165.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01086, ecapa_loss=0.0001662, whisper_loss=0.09162, over 3887161.18 frames. ], batch size: 75, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:38:29,641 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 09:38:47,124 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.98 vs. limit=15.0 2024-08-13 09:38:52,747 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-13 09:39:07,422 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-13 09:39:18,633 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 12 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 09:39:29,342 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-13 09:39:30,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2103330.0, ans=0.1 2024-08-13 09:39:31,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2103330.0, ans=0.2 2024-08-13 09:39:40,625 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.669e+01 2.473e+01 2.699e+01 3.080e+01 4.653e+01, threshold=5.397e+01, percent-clipped=0.0 2024-08-13 09:39:47,328 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7450, loss[loss=0.1169, beats_loss=0.01261, ecapa_loss=0.0001455, whisper_loss=0.1028, over 20650.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0109, ecapa_loss=0.0001646, whisper_loss=0.09167, over 3880385.63 frames. ], batch size: 79, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:39:50,484 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 09:40:25,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2103630.0, ans=0.125 2024-08-13 09:40:28,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2103630.0, ans=0.0 2024-08-13 09:40:39,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2103730.0, ans=0.04949747468305833 2024-08-13 09:40:39,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2103730.0, ans=0.05 2024-08-13 09:40:51,615 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 09:41:03,525 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7500, loss[loss=0.08896, beats_loss=0.01249, ecapa_loss=0.0001608, whisper_loss=0.07486, over 15244.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01085, ecapa_loss=0.000165, whisper_loss=0.09139, over 3867729.98 frames. ], batch size: 65, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:41:11,045 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 09:41:15,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2103930.0, ans=0.125 2024-08-13 09:42:11,385 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.360e+01 2.624e+01 2.937e+01 1.240e+02, threshold=5.248e+01, percent-clipped=1.0 2024-08-13 09:42:16,164 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-13 09:42:17,258 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7550, loss[loss=0.1269, beats_loss=0.008953, ecapa_loss=0.0001894, whisper_loss=0.116, over 19216.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01083, ecapa_loss=0.0001659, whisper_loss=0.09181, over 3864259.69 frames. ], batch size: 74, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:42:30,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=2104530.0, ans=0.2 2024-08-13 09:42:35,757 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 09:43:00,734 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.95 vs. limit=15.0 2024-08-13 09:43:20,951 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.457e+01 2024-08-13 09:43:30,072 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 09:43:32,212 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7600, loss[loss=0.1136, beats_loss=0.008499, ecapa_loss=0.000203, whisper_loss=0.103, over 17247.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01077, ecapa_loss=0.0001664, whisper_loss=0.09116, over 3821657.28 frames. ], batch size: 65, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:43:33,540 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.90 vs. limit=15.0 2024-08-13 09:43:39,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2104930.0, ans=0.0 2024-08-13 09:43:40,150 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 18 from LS+wenet, 33 from Vox, 40 fro AS 2024-08-13 09:43:45,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2104930.0, ans=0.0 2024-08-13 09:43:50,126 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 09:43:58,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2105030.0, ans=0.125 2024-08-13 09:44:00,277 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2024-08-13 09:44:29,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2105230.0, ans=0.0 2024-08-13 09:44:29,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2105230.0, ans=0.0 2024-08-13 09:44:34,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2105330.0, ans=0.0 2024-08-13 09:44:41,100 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.611e+01 2.428e+01 2.721e+01 3.053e+01 1.709e+02, threshold=5.443e+01, percent-clipped=2.0 2024-08-13 09:44:42,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2105330.0, ans=0.0 2024-08-13 09:44:46,663 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7650, loss[loss=0.08292, beats_loss=0.01095, ecapa_loss=0.0001863, whisper_loss=0.07011, over 15561.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01083, ecapa_loss=0.0001661, whisper_loss=0.09066, over 3813974.64 frames. ], batch size: 67, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:45:08,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2105530.0, ans=0.125 2024-08-13 09:45:19,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2105630.0, ans=0.1 2024-08-13 09:45:24,948 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-13 09:45:26,593 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 09:45:27,272 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2024-08-13 09:45:31,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2105730.0, ans=0.125 2024-08-13 09:45:51,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2105830.0, ans=0.2 2024-08-13 09:45:55,480 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-13 09:46:01,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2105930.0, ans=0.125 2024-08-13 09:46:02,633 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7700, loss[loss=0.1161, beats_loss=0.01035, ecapa_loss=0.0001597, whisper_loss=0.1042, over 16738.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01081, ecapa_loss=0.0001659, whisper_loss=0.0908, over 3839216.71 frames. ], batch size: 65, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:46:21,009 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2024-08-13 09:46:22,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2106030.0, ans=0.125 2024-08-13 09:46:28,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2106030.0, ans=0.09899494936611666 2024-08-13 09:46:38,210 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 09:46:41,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2106130.0, ans=0.125 2024-08-13 09:46:44,575 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 09:46:49,732 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.86 vs. limit=15.0 2024-08-13 09:46:51,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2106230.0, ans=0.125 2024-08-13 09:46:55,531 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 09:47:01,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2106230.0, ans=0.1 2024-08-13 09:47:10,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2106330.0, ans=0.2 2024-08-13 09:47:12,412 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.458e+01 2.712e+01 3.112e+01 4.115e+01, threshold=5.423e+01, percent-clipped=0.0 2024-08-13 09:47:13,131 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 09:47:16,943 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 09:47:18,025 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7750, loss[loss=0.08329, beats_loss=0.01205, ecapa_loss=0.0001405, whisper_loss=0.06983, over 14330.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0108, ecapa_loss=0.0001647, whisper_loss=0.09077, over 3857170.78 frames. ], batch size: 57, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:47:22,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2106430.0, ans=0.1 2024-08-13 09:47:28,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2106430.0, ans=0.2 2024-08-13 09:47:29,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2106430.0, ans=0.1 2024-08-13 09:47:32,858 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.66 vs. limit=15.0 2024-08-13 09:47:51,773 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 09:47:53,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2106630.0, ans=0.125 2024-08-13 09:47:56,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2106630.0, ans=0.0 2024-08-13 09:48:15,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2106730.0, ans=0.125 2024-08-13 09:48:17,785 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-13 09:48:20,541 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 10 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-13 09:48:22,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2106830.0, ans=0.1 2024-08-13 09:48:31,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2106830.0, ans=0.0 2024-08-13 09:48:35,000 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7800, loss[loss=0.09743, beats_loss=0.01344, ecapa_loss=0.000131, whisper_loss=0.08268, over 21241.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01087, ecapa_loss=0.000164, whisper_loss=0.09005, over 3860989.15 frames. ], batch size: 85, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:49:02,830 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 09:49:03,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2107030.0, ans=0.2 2024-08-13 09:49:13,696 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.42 vs. limit=15.0 2024-08-13 09:49:20,643 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 09:49:22,618 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-13 09:49:23,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2107230.0, ans=0.125 2024-08-13 09:49:28,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2107230.0, ans=10.0 2024-08-13 09:49:33,302 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.77 vs. limit=15.0 2024-08-13 09:49:45,233 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.478e+01 2.776e+01 3.061e+01 6.531e+01, threshold=5.553e+01, percent-clipped=2.0 2024-08-13 09:49:45,366 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-13 09:49:51,050 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7850, loss[loss=0.07675, beats_loss=0.01501, ecapa_loss=0.0001274, whisper_loss=0.06046, over 16624.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01093, ecapa_loss=0.0001637, whisper_loss=0.08975, over 3859064.96 frames. ], batch size: 68, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:50:15,435 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-13 09:50:26,733 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 33 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 09:50:37,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2107730.0, ans=0.125 2024-08-13 09:50:42,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2107730.0, ans=0.125 2024-08-13 09:50:45,369 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.215e-01 2024-08-13 09:50:45,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2107730.0, ans=0.2 2024-08-13 09:50:46,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2107730.0, ans=0.125 2024-08-13 09:50:51,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2107830.0, ans=0.125 2024-08-13 09:51:08,220 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7900, loss[loss=0.08648, beats_loss=0.01378, ecapa_loss=0.0001613, whisper_loss=0.07109, over 18794.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01103, ecapa_loss=0.0001624, whisper_loss=0.08981, over 3869496.24 frames. ], batch size: 78, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:51:15,525 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-13 09:51:17,892 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 30 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-13 09:51:27,740 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 09:51:49,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2108130.0, ans=0.125 2024-08-13 09:51:55,989 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.87 vs. limit=15.0 2024-08-13 09:52:11,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2108330.0, ans=0.125 2024-08-13 09:52:14,513 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2024-08-13 09:52:16,624 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2024-08-13 09:52:17,137 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 09:52:19,999 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.346e+01 2.630e+01 3.151e+01 7.356e+01, threshold=5.260e+01, percent-clipped=1.0 2024-08-13 09:52:26,771 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 7950, loss[loss=0.08432, beats_loss=0.01235, ecapa_loss=0.0001692, whisper_loss=0.07028, over 15059.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.011, ecapa_loss=0.0001635, whisper_loss=0.08953, over 3842857.59 frames. ], batch size: 63, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:52:33,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2108430.0, ans=0.1 2024-08-13 09:52:34,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.70 vs. limit=15.0 2024-08-13 09:52:58,211 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 09:53:17,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2108730.0, ans=0.0 2024-08-13 09:53:18,711 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 09:53:40,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2108830.0, ans=0.125 2024-08-13 09:53:45,240 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8000, loss[loss=0.09778, beats_loss=0.01333, ecapa_loss=0.0001735, whisper_loss=0.08272, over 22018.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01096, ecapa_loss=0.0001631, whisper_loss=0.09077, over 3875996.82 frames. ], batch size: 91, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:54:04,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2109030.0, ans=0.125 2024-08-13 09:54:25,268 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.08 vs. limit=6.0 2024-08-13 09:54:28,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2109130.0, ans=0.125 2024-08-13 09:54:37,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2109230.0, ans=0.09899494936611666 2024-08-13 09:54:56,447 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.293e+01 2.578e+01 2.886e+01 4.471e+01, threshold=5.156e+01, percent-clipped=0.0 2024-08-13 09:55:02,802 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8050, loss[loss=0.1197, beats_loss=0.00956, ecapa_loss=0.0001501, whisper_loss=0.1086, over 21290.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01094, ecapa_loss=0.0001645, whisper_loss=0.09077, over 3852147.07 frames. ], batch size: 81, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:55:15,734 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.94 vs. limit=10.0 2024-08-13 09:55:24,681 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 27 from Vox, 16 fro AS 2024-08-13 09:55:26,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2109530.0, ans=0.0 2024-08-13 09:55:30,697 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-13 09:55:37,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2109630.0, ans=0.125 2024-08-13 09:55:46,838 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.71 vs. limit=15.0 2024-08-13 09:55:56,315 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 09:56:01,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2109730.0, ans=0.125 2024-08-13 09:56:20,582 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8100, loss[loss=0.123, beats_loss=0.009968, ecapa_loss=0.0001923, whisper_loss=0.1111, over 20597.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01088, ecapa_loss=0.0001649, whisper_loss=0.0908, over 3854196.21 frames. ], batch size: 84, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:56:24,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2109930.0, ans=0.2 2024-08-13 09:56:35,583 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-13 09:56:50,663 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 09:56:52,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2110130.0, ans=0.0 2024-08-13 09:56:59,764 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 09:57:09,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2110230.0, ans=0.125 2024-08-13 09:57:20,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2110330.0, ans=0.125 2024-08-13 09:57:25,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2110330.0, ans=0.2 2024-08-13 09:57:30,352 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.445e+01 2.691e+01 3.022e+01 6.409e+01, threshold=5.382e+01, percent-clipped=1.0 2024-08-13 09:57:32,444 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-13 09:57:36,556 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.17 vs. limit=10.0 2024-08-13 09:57:36,930 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8150, loss[loss=0.1231, beats_loss=0.009811, ecapa_loss=0.0001627, whisper_loss=0.1117, over 19100.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0108, ecapa_loss=0.0001662, whisper_loss=0.09114, over 3837311.16 frames. ], batch size: 72, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:57:37,713 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=15.0 2024-08-13 09:57:51,357 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 09:57:52,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2110530.0, ans=0.05 2024-08-13 09:57:54,030 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-13 09:57:59,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2110530.0, ans=0.0 2024-08-13 09:58:16,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2110630.0, ans=0.125 2024-08-13 09:58:52,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2110830.0, ans=0.1 2024-08-13 09:58:54,373 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8200, loss[loss=0.1182, beats_loss=0.01043, ecapa_loss=0.00013, whisper_loss=0.1065, over 21730.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01084, ecapa_loss=0.0001661, whisper_loss=0.09113, over 3876654.39 frames. ], batch size: 80, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:59:07,325 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-13 09:59:19,587 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 09:59:31,902 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-13 09:59:38,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2111230.0, ans=0.125 2024-08-13 09:59:43,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2111230.0, ans=0.0 2024-08-13 10:00:03,548 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 10:00:08,258 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.520e+01 2.689e+01 2.972e+01 4.311e+01, threshold=5.378e+01, percent-clipped=0.0 2024-08-13 10:00:10,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2111330.0, ans=0.125 2024-08-13 10:00:14,745 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8250, loss[loss=0.1083, beats_loss=0.01057, ecapa_loss=0.0001433, whisper_loss=0.09632, over 23210.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01091, ecapa_loss=0.0001646, whisper_loss=0.09135, over 3918906.05 frames. ], batch size: 92, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:00:23,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2111430.0, ans=0.125 2024-08-13 10:00:46,356 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 10:00:51,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2111630.0, ans=0.1 2024-08-13 10:00:56,538 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-13 10:01:06,399 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 10:01:07,682 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-13 10:01:07,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2111730.0, ans=0.125 2024-08-13 10:01:30,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2111830.0, ans=0.0 2024-08-13 10:01:34,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2111930.0, ans=0.0 2024-08-13 10:01:35,587 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8300, loss[loss=0.0961, beats_loss=0.009306, ecapa_loss=0.0001558, whisper_loss=0.08523, over 21872.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01088, ecapa_loss=0.0001646, whisper_loss=0.0917, over 3928127.45 frames. ], batch size: 86, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:01:51,056 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-13 10:02:00,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2112030.0, ans=0.125 2024-08-13 10:02:04,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2112130.0, ans=0.0 2024-08-13 10:02:19,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2112130.0, ans=0.1 2024-08-13 10:02:29,719 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 34 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-13 10:02:39,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2112330.0, ans=0.0 2024-08-13 10:02:42,492 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.85 vs. limit=22.5 2024-08-13 10:02:46,560 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.390e+01 2.767e+01 3.084e+01 3.775e+01, threshold=5.535e+01, percent-clipped=0.0 2024-08-13 10:02:46,921 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 10:02:52,857 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8350, loss[loss=0.1066, beats_loss=0.01024, ecapa_loss=0.0001909, whisper_loss=0.09449, over 18311.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01084, ecapa_loss=0.0001657, whisper_loss=0.09177, over 3920530.93 frames. ], batch size: 75, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:03:46,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2112730.0, ans=0.1 2024-08-13 10:04:04,979 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 30 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-13 10:04:10,719 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8400, loss[loss=0.1144, beats_loss=0.01088, ecapa_loss=0.0001364, whisper_loss=0.1022, over 17849.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01087, ecapa_loss=0.0001655, whisper_loss=0.09203, over 3903960.27 frames. ], batch size: 65, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:04:27,349 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 10:04:39,855 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-13 10:04:40,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2113130.0, ans=0.04949747468305833 2024-08-13 10:05:04,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2113230.0, ans=0.125 2024-08-13 10:05:10,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2113230.0, ans=0.2 2024-08-13 10:05:15,221 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 10:05:15,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2113330.0, ans=0.125 2024-08-13 10:05:22,088 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.471e+01 2.703e+01 3.041e+01 5.042e+01, threshold=5.407e+01, percent-clipped=0.0 2024-08-13 10:05:28,248 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8450, loss[loss=0.1235, beats_loss=0.01025, ecapa_loss=0.0001494, whisper_loss=0.1117, over 23324.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01084, ecapa_loss=0.000166, whisper_loss=0.09194, over 3892856.91 frames. ], batch size: 88, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:05:32,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2113430.0, ans=0.025 2024-08-13 10:05:33,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2113430.0, ans=0.09899494936611666 2024-08-13 10:05:36,331 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 10:05:51,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2113530.0, ans=0.0 2024-08-13 10:05:55,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2113530.0, ans=0.0 2024-08-13 10:06:02,681 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 10:06:03,973 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 34 from Vox, 35 fro AS 2024-08-13 10:06:44,949 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 28 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-13 10:06:47,926 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-13 10:06:48,882 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8500, loss[loss=0.1127, beats_loss=0.01169, ecapa_loss=0.0001512, whisper_loss=0.09952, over 23293.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0108, ecapa_loss=0.0001668, whisper_loss=0.09233, over 3901098.71 frames. ], batch size: 93, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:06:57,488 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2024-08-13 10:07:16,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2114030.0, ans=0.1 2024-08-13 10:08:04,264 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.025e+01 2.378e+01 2.649e+01 2.972e+01 5.253e+01, threshold=5.297e+01, percent-clipped=0.0 2024-08-13 10:08:07,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2114330.0, ans=0.125 2024-08-13 10:08:10,670 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8550, loss[loss=0.1152, beats_loss=0.008156, ecapa_loss=0.0001958, whisper_loss=0.1051, over 21389.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01075, ecapa_loss=0.0001665, whisper_loss=0.09207, over 3884936.50 frames. ], batch size: 87, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:08:11,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2114430.0, ans=0.2 2024-08-13 10:08:31,184 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.38 vs. limit=15.0 2024-08-13 10:08:34,088 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 18 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 10:08:36,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2114530.0, ans=0.1 2024-08-13 10:08:39,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2114530.0, ans=0.125 2024-08-13 10:08:51,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2114630.0, ans=0.125 2024-08-13 10:09:13,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2114830.0, ans=0.125 2024-08-13 10:09:16,977 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 27 from Vox, 20 fro AS 2024-08-13 10:09:23,465 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 10:09:31,214 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8600, loss[loss=0.09753, beats_loss=0.007789, ecapa_loss=0.0001648, whisper_loss=0.08809, over 15496.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01076, ecapa_loss=0.0001661, whisper_loss=0.09224, over 3877843.74 frames. ], batch size: 57, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:09:45,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2114930.0, ans=0.125 2024-08-13 10:09:47,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2115030.0, ans=0.0 2024-08-13 10:09:58,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2115030.0, ans=0.125 2024-08-13 10:10:01,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2115030.0, ans=0.0 2024-08-13 10:10:02,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2115130.0, ans=0.125 2024-08-13 10:10:11,212 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.48 vs. limit=15.0 2024-08-13 10:10:18,072 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=15.0 2024-08-13 10:10:27,309 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.22 vs. limit=10.0 2024-08-13 10:10:30,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2115230.0, ans=0.1 2024-08-13 10:10:37,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2115330.0, ans=0.125 2024-08-13 10:10:41,275 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 10:10:45,073 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.403e+01 2.760e+01 3.057e+01 6.734e+01, threshold=5.520e+01, percent-clipped=3.0 2024-08-13 10:10:47,665 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 10:10:51,401 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8650, loss[loss=0.06935, beats_loss=0.01501, ecapa_loss=0.0001269, whisper_loss=0.05307, over 13076.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0108, ecapa_loss=0.0001654, whisper_loss=0.09214, over 3892965.44 frames. ], batch size: 54, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:10:55,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2115430.0, ans=10.0 2024-08-13 10:10:58,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2115430.0, ans=0.0 2024-08-13 10:11:26,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2115630.0, ans=0.2 2024-08-13 10:11:44,236 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 10:11:46,630 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.45 vs. limit=22.5 2024-08-13 10:12:08,309 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8700, loss[loss=0.1027, beats_loss=0.009272, ecapa_loss=0.000196, whisper_loss=0.09149, over 15942.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01075, ecapa_loss=0.0001665, whisper_loss=0.09195, over 3873072.91 frames. ], batch size: 67, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:12:25,561 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 10:12:27,902 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 10:12:49,826 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=12.0 2024-08-13 10:13:24,119 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.443e+01 2.656e+01 3.130e+01 5.733e+01, threshold=5.311e+01, percent-clipped=2.0 2024-08-13 10:13:29,173 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 10:13:30,127 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8750, loss[loss=0.1187, beats_loss=0.01036, ecapa_loss=0.0001824, whisper_loss=0.1065, over 16077.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01073, ecapa_loss=0.0001678, whisper_loss=0.0916, over 3847362.85 frames. ], batch size: 66, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:13:31,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2116430.0, ans=0.125 2024-08-13 10:13:47,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2116530.0, ans=0.125 2024-08-13 10:13:58,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2116530.0, ans=0.125 2024-08-13 10:14:22,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2116730.0, ans=0.09899494936611666 2024-08-13 10:14:49,855 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8800, loss[loss=0.1208, beats_loss=0.01229, ecapa_loss=0.0001479, whisper_loss=0.107, over 22597.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01083, ecapa_loss=0.0001667, whisper_loss=0.09154, over 3861715.46 frames. ], batch size: 90, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:14:52,203 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 10:14:52,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2116930.0, ans=0.09899494936611666 2024-08-13 10:15:13,003 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 10:15:27,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2117130.0, ans=0.1 2024-08-13 10:15:33,113 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.86 vs. limit=22.5 2024-08-13 10:15:40,727 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2024-08-13 10:15:57,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2117330.0, ans=0.2 2024-08-13 10:15:58,326 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 10:16:06,045 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.410e+01 2.636e+01 2.976e+01 1.522e+02, threshold=5.272e+01, percent-clipped=1.0 2024-08-13 10:16:13,294 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8850, loss[loss=0.1029, beats_loss=0.01135, ecapa_loss=0.0001711, whisper_loss=0.08983, over 18876.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01094, ecapa_loss=0.0001659, whisper_loss=0.09101, over 3881886.78 frames. ], batch size: 75, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:16:26,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2117430.0, ans=0.1 2024-08-13 10:16:26,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2117430.0, ans=0.0 2024-08-13 10:16:40,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2117530.0, ans=0.125 2024-08-13 10:16:52,253 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 33 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-13 10:17:01,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2117730.0, ans=0.1 2024-08-13 10:17:04,806 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2024-08-13 10:17:09,989 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-13 10:17:34,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8900, loss[loss=0.09622, beats_loss=0.009456, ecapa_loss=0.0002307, whisper_loss=0.08446, over 15685.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01088, ecapa_loss=0.0001675, whisper_loss=0.09131, over 3868953.44 frames. ], batch size: 65, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:17:52,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2118030.0, ans=0.125 2024-08-13 10:17:55,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2118030.0, ans=0.0 2024-08-13 10:18:24,382 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 10:18:40,718 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=15.0 2024-08-13 10:18:45,836 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.39 vs. limit=15.0 2024-08-13 10:18:47,222 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2024-08-13 10:18:48,334 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.342e+01 2.664e+01 2.910e+01 6.216e+01, threshold=5.329e+01, percent-clipped=1.0 2024-08-13 10:18:53,172 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 10:18:54,503 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 8950, loss[loss=0.09556, beats_loss=0.01354, ecapa_loss=0.0001676, whisper_loss=0.08035, over 18195.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01096, ecapa_loss=0.0001674, whisper_loss=0.09101, over 3877287.72 frames. ], batch size: 76, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:18:55,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2118430.0, ans=0.1 2024-08-13 10:19:03,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2118430.0, ans=0.0 2024-08-13 10:19:05,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2118430.0, ans=0.0 2024-08-13 10:19:19,809 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.28 vs. limit=15.0 2024-08-13 10:19:34,776 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2024-08-13 10:19:35,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2118630.0, ans=0.1 2024-08-13 10:19:37,119 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 10:20:09,334 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 10:20:13,265 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9000, loss[loss=0.105, beats_loss=0.009153, ecapa_loss=0.0001513, whisper_loss=0.09431, over 16745.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01086, ecapa_loss=0.0001676, whisper_loss=0.091, over 3852819.67 frames. ], batch size: 64, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:20:13,266 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-13 10:20:54,923 INFO [train_multi_KD3.py:1149] (2/4) Epoch 15, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005617, whisper_loss=0.2479, over 922467.00 frames. 2024-08-13 10:21:13,628 INFO [train_multi_KD3.py:1149] (2/4) Epoch 15, validation on SV_voxceleb1: loss=0.004578, beats_loss=0, ecapa_loss=0.0004578, whisper_loss=0, over 939242.00 frames. 2024-08-13 10:23:02,666 INFO [train_multi_KD3.py:1149] (2/4) Epoch 15, validation on AT_audioset: loss=0.02381, beats_loss=0.02381, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 10:23:02,671 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-13 10:23:05,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2118930.0, ans=0.125 2024-08-13 10:23:40,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2119130.0, ans=0.1 2024-08-13 10:23:50,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2119230.0, ans=0.2 2024-08-13 10:24:06,188 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.76 vs. limit=10.0 2024-08-13 10:24:18,377 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.388e+01 2.773e+01 3.157e+01 5.459e+01, threshold=5.546e+01, percent-clipped=1.0 2024-08-13 10:24:24,654 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9050, loss[loss=0.08007, beats_loss=0.01448, ecapa_loss=0.0001626, whisper_loss=0.06397, over 22087.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0109, ecapa_loss=0.0001653, whisper_loss=0.09089, over 3827347.08 frames. ], batch size: 93, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:24:28,674 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=22.5 2024-08-13 10:24:29,497 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 10:24:31,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2119430.0, ans=0.125 2024-08-13 10:24:59,981 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 10:25:11,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2119730.0, ans=0.125 2024-08-13 10:25:40,287 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 10:25:44,359 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9100, loss[loss=0.1106, beats_loss=0.009023, ecapa_loss=0.0001487, whisper_loss=0.1001, over 22232.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01085, ecapa_loss=0.0001651, whisper_loss=0.09103, over 3862158.38 frames. ], batch size: 86, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:25:45,017 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 10:25:45,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2119930.0, ans=0.0 2024-08-13 10:25:45,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2119930.0, ans=0.125 2024-08-13 10:26:01,697 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 10:26:13,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2120030.0, ans=0.07 2024-08-13 10:26:18,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2120130.0, ans=0.2 2024-08-13 10:26:20,425 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 10:26:32,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2120130.0, ans=0.2 2024-08-13 10:26:39,294 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 10:26:50,516 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=12.0 2024-08-13 10:26:57,541 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.364e-01 2024-08-13 10:27:02,866 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.361e+01 2.637e+01 2.940e+01 4.647e+01, threshold=5.274e+01, percent-clipped=0.0 2024-08-13 10:27:05,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2120330.0, ans=0.2 2024-08-13 10:27:10,226 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9150, loss[loss=0.09624, beats_loss=0.00989, ecapa_loss=0.0001826, whisper_loss=0.08452, over 17312.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0109, ecapa_loss=0.0001634, whisper_loss=0.09074, over 3892982.55 frames. ], batch size: 69, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:27:20,806 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 28 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-13 10:27:25,718 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 26 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-13 10:27:27,601 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 10:27:34,261 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.00 vs. limit=15.0 2024-08-13 10:27:51,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2120630.0, ans=0.0 2024-08-13 10:28:11,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2120730.0, ans=0.125 2024-08-13 10:28:21,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2120830.0, ans=0.0 2024-08-13 10:28:25,738 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 19 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-13 10:28:29,883 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9200, loss[loss=0.1024, beats_loss=0.01262, ecapa_loss=0.0001352, whisper_loss=0.0884, over 20929.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.011, ecapa_loss=0.0001629, whisper_loss=0.09019, over 3895024.25 frames. ], batch size: 82, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:28:30,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2120930.0, ans=0.125 2024-08-13 10:28:32,553 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-13 10:29:01,004 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-13 10:29:13,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2121130.0, ans=0.0 2024-08-13 10:29:18,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=2121230.0, ans=22.5 2024-08-13 10:29:30,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2121230.0, ans=0.2 2024-08-13 10:29:41,028 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-13 10:29:41,402 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=12.0 2024-08-13 10:29:41,969 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.412e+01 2.586e+01 2.944e+01 1.076e+02, threshold=5.171e+01, percent-clipped=1.0 2024-08-13 10:29:48,968 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9250, loss[loss=0.1142, beats_loss=0.01128, ecapa_loss=0.0001426, whisper_loss=0.1015, over 23127.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01099, ecapa_loss=0.000164, whisper_loss=0.09015, over 3917398.84 frames. ], batch size: 91, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:30:20,759 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 31 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 10:30:34,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2024-08-13 10:30:38,779 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=15.0 2024-08-13 10:30:38,844 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=15.0 2024-08-13 10:30:49,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2121730.0, ans=0.2 2024-08-13 10:30:58,560 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=12.0 2024-08-13 10:31:03,125 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 10:31:13,843 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9300, loss[loss=0.09247, beats_loss=0.01004, ecapa_loss=0.0001957, whisper_loss=0.08048, over 15755.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01086, ecapa_loss=0.0001635, whisper_loss=0.09151, over 3899088.14 frames. ], batch size: 65, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:31:27,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2121930.0, ans=0.0 2024-08-13 10:31:29,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2122030.0, ans=0.0 2024-08-13 10:31:40,647 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-08-13 10:31:43,384 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 27 from LS+wenet, 22 from Vox, 14 fro AS 2024-08-13 10:31:48,071 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.43 vs. limit=22.5 2024-08-13 10:32:12,008 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 10:32:27,690 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.057e+01 2.387e+01 2.545e+01 2.935e+01 6.659e+01, threshold=5.090e+01, percent-clipped=1.0 2024-08-13 10:32:34,609 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9350, loss[loss=0.1049, beats_loss=0.0112, ecapa_loss=0.0001104, whisper_loss=0.09262, over 20915.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01085, ecapa_loss=0.0001634, whisper_loss=0.09207, over 3848095.24 frames. ], batch size: 73, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:33:12,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2122630.0, ans=0.125 2024-08-13 10:33:12,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2122630.0, ans=0.2 2024-08-13 10:33:29,028 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-13 10:33:40,115 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 10:33:42,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2122830.0, ans=0.1 2024-08-13 10:33:55,890 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9400, loss[loss=0.1095, beats_loss=0.008522, ecapa_loss=0.0001592, whisper_loss=0.0994, over 15661.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01075, ecapa_loss=0.0001651, whisper_loss=0.09201, over 3846969.61 frames. ], batch size: 59, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:34:02,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2122930.0, ans=0.125 2024-08-13 10:34:03,503 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 10:34:23,889 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 10:34:39,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2123130.0, ans=0.125 2024-08-13 10:34:42,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2123130.0, ans=0.125 2024-08-13 10:34:45,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2123230.0, ans=0.0 2024-08-13 10:35:02,972 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-08-13 10:35:08,883 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 10:35:11,318 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.356e+01 2.664e+01 2.978e+01 5.324e+01, threshold=5.328e+01, percent-clipped=1.0 2024-08-13 10:35:16,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2123430.0, ans=0.0 2024-08-13 10:35:17,152 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9450, loss[loss=0.09738, beats_loss=0.01405, ecapa_loss=0.0001539, whisper_loss=0.08179, over 20113.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01087, ecapa_loss=0.0001641, whisper_loss=0.09157, over 3855624.75 frames. ], batch size: 81, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:35:38,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2123530.0, ans=0.0 2024-08-13 10:35:53,910 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-13 10:36:05,386 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.99 vs. limit=10.0 2024-08-13 10:36:08,217 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 10:36:42,315 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9500, loss[loss=0.106, beats_loss=0.007725, ecapa_loss=0.0002109, whisper_loss=0.09615, over 14841.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01085, ecapa_loss=0.0001639, whisper_loss=0.09097, over 3858871.68 frames. ], batch size: 59, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:36:57,469 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-13 10:36:59,137 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-13 10:36:59,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2123930.0, ans=0.5 2024-08-13 10:37:26,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2124130.0, ans=0.125 2024-08-13 10:37:54,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2124230.0, ans=0.2 2024-08-13 10:38:21,208 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 10:38:28,719 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.383e+01 2.725e+01 3.152e+01 1.098e+02, threshold=5.450e+01, percent-clipped=1.0 2024-08-13 10:38:35,154 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 10:38:35,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2124330.0, ans=0.125 2024-08-13 10:38:38,683 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9550, loss[loss=0.08794, beats_loss=0.01052, ecapa_loss=0.0001543, whisper_loss=0.07587, over 16715.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01086, ecapa_loss=0.000164, whisper_loss=0.0912, over 3872724.26 frames. ], batch size: 63, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:38:42,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2124430.0, ans=0.125 2024-08-13 10:38:42,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2124430.0, ans=0.0 2024-08-13 10:38:42,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2124430.0, ans=0.125 2024-08-13 10:39:14,167 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 10:40:11,403 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 10:40:21,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2124830.0, ans=0.0 2024-08-13 10:40:27,938 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9600, loss[loss=0.08525, beats_loss=0.01015, ecapa_loss=0.0002061, whisper_loss=0.07304, over 17559.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01082, ecapa_loss=0.0001643, whisper_loss=0.0915, over 3896335.26 frames. ], batch size: 74, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:40:57,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2125030.0, ans=0.125 2024-08-13 10:41:13,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2125130.0, ans=0.0 2024-08-13 10:41:28,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2125230.0, ans=0.0 2024-08-13 10:41:47,677 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.442e+01 2.705e+01 2.957e+01 4.182e+01, threshold=5.411e+01, percent-clipped=0.0 2024-08-13 10:41:55,136 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2024-08-13 10:41:55,719 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9650, loss[loss=0.07054, beats_loss=0.0121, ecapa_loss=0.0001544, whisper_loss=0.05689, over 20513.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.000165, whisper_loss=0.0911, over 3873937.21 frames. ], batch size: 82, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:42:22,457 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.89 vs. limit=15.0 2024-08-13 10:43:27,822 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9700, loss[loss=0.1133, beats_loss=0.009153, ecapa_loss=0.0001801, whisper_loss=0.1023, over 18035.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01075, ecapa_loss=0.0001659, whisper_loss=0.09098, over 3840181.07 frames. ], batch size: 73, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:43:36,649 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 10:43:39,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2125930.0, ans=0.2 2024-08-13 10:43:40,424 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 10:43:44,761 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 10:43:49,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2126030.0, ans=0.07 2024-08-13 10:44:19,905 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.81 vs. limit=15.0 2024-08-13 10:44:26,771 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=12.0 2024-08-13 10:44:33,142 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 10:44:57,755 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 27 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-13 10:44:58,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2126330.0, ans=0.125 2024-08-13 10:45:09,636 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.435e+01 2.595e+01 3.006e+01 3.939e+01, threshold=5.189e+01, percent-clipped=0.0 2024-08-13 10:45:16,811 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9750, loss[loss=0.08174, beats_loss=0.009269, ecapa_loss=0.0002415, whisper_loss=0.07005, over 14353.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.0001664, whisper_loss=0.0909, over 3839558.96 frames. ], batch size: 59, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:45:21,044 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.72 vs. limit=12.0 2024-08-13 10:45:25,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2126430.0, ans=0.125 2024-08-13 10:46:05,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.64 vs. limit=10.0 2024-08-13 10:46:09,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.10 vs. limit=10.0 2024-08-13 10:46:18,472 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.05 vs. limit=10.0 2024-08-13 10:46:31,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2126730.0, ans=0.125 2024-08-13 10:46:32,099 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.69 vs. limit=6.0 2024-08-13 10:46:44,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2126730.0, ans=0.035 2024-08-13 10:46:54,989 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-08-13 10:47:10,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2126830.0, ans=0.125 2024-08-13 10:47:12,460 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9800, loss[loss=0.09624, beats_loss=0.01074, ecapa_loss=0.0001679, whisper_loss=0.08382, over 15425.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01073, ecapa_loss=0.0001652, whisper_loss=0.09237, over 3872008.19 frames. ], batch size: 61, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:47:26,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2126930.0, ans=0.125 2024-08-13 10:47:37,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2127030.0, ans=0.0 2024-08-13 10:47:50,604 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.20 vs. limit=22.5 2024-08-13 10:47:55,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2127030.0, ans=0.0 2024-08-13 10:48:20,921 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-13 10:48:28,742 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 10:49:04,861 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.353e+01 2.628e+01 3.072e+01 7.221e+01, threshold=5.255e+01, percent-clipped=1.0 2024-08-13 10:49:12,314 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9850, loss[loss=0.104, beats_loss=0.01176, ecapa_loss=0.0001663, whisper_loss=0.09059, over 22851.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01064, ecapa_loss=0.0001656, whisper_loss=0.09346, over 3875114.23 frames. ], batch size: 92, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:49:12,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2127430.0, ans=0.95 2024-08-13 10:49:27,284 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2024-08-13 10:49:51,170 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 10:50:00,434 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 35 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 10:51:05,641 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9900, loss[loss=0.1011, beats_loss=0.01291, ecapa_loss=0.0001599, whisper_loss=0.08663, over 22320.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01081, ecapa_loss=0.0001651, whisper_loss=0.09247, over 3895233.73 frames. ], batch size: 90, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:51:20,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2128030.0, ans=0.1 2024-08-13 10:51:34,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2128030.0, ans=0.125 2024-08-13 10:51:38,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2128130.0, ans=0.0 2024-08-13 10:51:56,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2128230.0, ans=0.1 2024-08-13 10:52:07,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2128330.0, ans=0.1 2024-08-13 10:52:07,033 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.554e-03 2024-08-13 10:52:07,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2128330.0, ans=0.2 2024-08-13 10:52:12,025 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.71 vs. limit=15.0 2024-08-13 10:52:13,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2128330.0, ans=0.125 2024-08-13 10:52:18,733 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.402e+01 2.725e+01 3.042e+01 4.728e+01, threshold=5.451e+01, percent-clipped=0.0 2024-08-13 10:52:23,203 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 9950, loss[loss=0.1035, beats_loss=0.0106, ecapa_loss=0.0001807, whisper_loss=0.09105, over 18772.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01075, ecapa_loss=0.0001662, whisper_loss=0.09224, over 3861412.11 frames. ], batch size: 75, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:52:36,921 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 10:52:59,380 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 10:53:04,413 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 10:53:15,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2128730.0, ans=0.1 2024-08-13 10:53:33,627 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 10:53:42,740 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10000, loss[loss=0.1046, beats_loss=0.00975, ecapa_loss=0.0001743, whisper_loss=0.0931, over 16589.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01076, ecapa_loss=0.0001669, whisper_loss=0.09215, over 3838699.03 frames. ], batch size: 67, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:53:43,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2128930.0, ans=0.125 2024-08-13 10:53:46,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2128930.0, ans=0.1 2024-08-13 10:54:14,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2129130.0, ans=0.125 2024-08-13 10:54:32,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2129230.0, ans=0.0 2024-08-13 10:54:52,042 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 10:54:57,439 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.402e+01 2.704e+01 2.977e+01 9.053e+01, threshold=5.409e+01, percent-clipped=1.0 2024-08-13 10:54:57,724 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 31 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 10:55:02,764 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10050, loss[loss=0.1193, beats_loss=0.006134, ecapa_loss=0.0002159, whisper_loss=0.111, over 16593.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01074, ecapa_loss=0.0001663, whisper_loss=0.09198, over 3876520.92 frames. ], batch size: 67, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:55:08,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2129430.0, ans=0.125 2024-08-13 10:55:13,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2129430.0, ans=0.125 2024-08-13 10:55:33,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2129530.0, ans=0.0 2024-08-13 10:55:37,845 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.41 vs. limit=15.0 2024-08-13 10:55:43,259 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.98 vs. limit=15.0 2024-08-13 10:55:50,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2129730.0, ans=0.0 2024-08-13 10:55:52,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2129730.0, ans=0.125 2024-08-13 10:56:13,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2129830.0, ans=0.1 2024-08-13 10:56:25,420 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10100, loss[loss=0.1033, beats_loss=0.01128, ecapa_loss=0.0001542, whisper_loss=0.09053, over 19201.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01075, ecapa_loss=0.0001657, whisper_loss=0.09207, over 3903370.90 frames. ], batch size: 76, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:56:31,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2129930.0, ans=0.125 2024-08-13 10:56:55,240 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=7.486e-02 2024-08-13 10:57:05,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2130130.0, ans=0.125 2024-08-13 10:57:14,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2130230.0, ans=0.04949747468305833 2024-08-13 10:57:18,089 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 10:57:30,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2130330.0, ans=0.1 2024-08-13 10:57:36,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2130330.0, ans=0.125 2024-08-13 10:57:38,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2130330.0, ans=0.125 2024-08-13 10:57:39,950 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 10:57:42,148 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.392e+01 2.656e+01 2.956e+01 4.246e+01, threshold=5.312e+01, percent-clipped=0.0 2024-08-13 10:57:46,735 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10150, loss[loss=0.08751, beats_loss=0.01254, ecapa_loss=0.0001145, whisper_loss=0.07382, over 16244.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0108, ecapa_loss=0.0001657, whisper_loss=0.09137, over 3906492.00 frames. ], batch size: 63, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:57:49,742 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2024-08-13 10:57:58,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2130430.0, ans=0.125 2024-08-13 10:58:08,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2130530.0, ans=0.125 2024-08-13 10:58:10,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2130530.0, ans=0.0 2024-08-13 10:58:11,492 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 10:58:13,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2130530.0, ans=0.125 2024-08-13 10:58:22,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2130630.0, ans=0.125 2024-08-13 10:58:28,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2130630.0, ans=0.0 2024-08-13 10:58:35,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2130730.0, ans=0.125 2024-08-13 10:58:46,268 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 10:59:06,769 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10200, loss[loss=0.1269, beats_loss=0.007259, ecapa_loss=0.0001819, whisper_loss=0.1178, over 17882.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0107, ecapa_loss=0.000166, whisper_loss=0.09168, over 3897758.66 frames. ], batch size: 69, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:59:19,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2024-08-13 10:59:25,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2131030.0, ans=0.125 2024-08-13 10:59:34,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2131030.0, ans=0.125 2024-08-13 10:59:57,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2131230.0, ans=0.125 2024-08-13 11:00:21,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2131330.0, ans=0.1 2024-08-13 11:00:22,282 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.425e+01 2.688e+01 3.008e+01 5.255e+01, threshold=5.377e+01, percent-clipped=0.0 2024-08-13 11:00:27,133 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10250, loss[loss=0.09091, beats_loss=0.01181, ecapa_loss=0.0001767, whisper_loss=0.07734, over 20078.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01075, ecapa_loss=0.0001664, whisper_loss=0.09198, over 3935156.99 frames. ], batch size: 85, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:00:31,892 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 11:00:59,271 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 11:01:03,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2131630.0, ans=0.125 2024-08-13 11:01:05,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2131630.0, ans=0.1 2024-08-13 11:01:07,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2131630.0, ans=0.1 2024-08-13 11:01:12,097 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=15.0 2024-08-13 11:01:13,671 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 11:01:19,661 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-13 11:01:22,448 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 19 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-13 11:01:28,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2131730.0, ans=0.0 2024-08-13 11:01:35,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2131830.0, ans=0.0 2024-08-13 11:01:46,346 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-13 11:01:48,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2131930.0, ans=0.0 2024-08-13 11:01:49,441 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10300, loss[loss=0.1058, beats_loss=0.01258, ecapa_loss=0.0001303, whisper_loss=0.09189, over 18857.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01074, ecapa_loss=0.0001646, whisper_loss=0.0926, over 3980066.20 frames. ], batch size: 72, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:02:02,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2131930.0, ans=0.125 2024-08-13 11:02:07,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2132030.0, ans=0.125 2024-08-13 11:02:08,642 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-13 11:02:10,230 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 11:02:18,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2132030.0, ans=0.125 2024-08-13 11:02:19,100 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-08-13 11:02:27,690 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 11:02:31,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=2132130.0, ans=0.2 2024-08-13 11:02:32,688 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 11:02:51,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2132330.0, ans=0.0 2024-08-13 11:02:56,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2132330.0, ans=0.125 2024-08-13 11:03:03,604 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.417e+01 2.741e+01 3.040e+01 4.375e+02, threshold=5.481e+01, percent-clipped=2.0 2024-08-13 11:03:07,869 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10350, loss[loss=0.09469, beats_loss=0.01081, ecapa_loss=0.0001516, whisper_loss=0.08237, over 15738.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01079, ecapa_loss=0.0001659, whisper_loss=0.0921, over 3954082.13 frames. ], batch size: 61, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:03:20,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2132430.0, ans=0.125 2024-08-13 11:03:22,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2132530.0, ans=0.2 2024-08-13 11:03:33,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2132530.0, ans=0.0 2024-08-13 11:03:38,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2132630.0, ans=0.125 2024-08-13 11:03:49,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2132630.0, ans=0.1 2024-08-13 11:03:58,382 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-13 11:04:05,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2132730.0, ans=0.0 2024-08-13 11:04:08,637 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-08-13 11:04:11,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2132830.0, ans=0.09899494936611666 2024-08-13 11:04:17,389 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2024-08-13 11:04:21,176 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-13 11:04:24,897 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10400, loss[loss=0.1121, beats_loss=0.01093, ecapa_loss=0.000148, whisper_loss=0.09969, over 23094.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01075, ecapa_loss=0.0001648, whisper_loss=0.09219, over 3941795.88 frames. ], batch size: 93, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:04:27,501 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.92 vs. limit=22.5 2024-08-13 11:04:47,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2133030.0, ans=0.2 2024-08-13 11:04:52,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2133030.0, ans=0.0 2024-08-13 11:04:54,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2133030.0, ans=0.125 2024-08-13 11:05:12,666 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.05 vs. limit=22.5 2024-08-13 11:05:15,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2133230.0, ans=0.0 2024-08-13 11:05:20,634 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.59 vs. limit=15.0 2024-08-13 11:05:23,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2133230.0, ans=0.125 2024-08-13 11:05:23,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2133230.0, ans=0.125 2024-08-13 11:05:35,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2133330.0, ans=0.0 2024-08-13 11:05:37,890 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.409e+01 2.723e+01 2.969e+01 5.956e+01, threshold=5.446e+01, percent-clipped=1.0 2024-08-13 11:05:42,402 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10450, loss[loss=0.1025, beats_loss=0.01276, ecapa_loss=0.0001383, whisper_loss=0.08838, over 18990.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01075, ecapa_loss=0.0001647, whisper_loss=0.09245, over 3957975.51 frames. ], batch size: 77, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:06:06,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2133530.0, ans=0.125 2024-08-13 11:06:13,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2133630.0, ans=0.125 2024-08-13 11:06:24,851 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 11:06:26,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2133730.0, ans=0.125 2024-08-13 11:06:58,824 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10500, loss[loss=0.1107, beats_loss=0.008841, ecapa_loss=0.0002032, whisper_loss=0.09979, over 22729.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01073, ecapa_loss=0.000166, whisper_loss=0.09219, over 3926560.06 frames. ], batch size: 93, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:07:14,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2134030.0, ans=0.125 2024-08-13 11:07:25,965 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.09 vs. limit=22.5 2024-08-13 11:07:42,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2134130.0, ans=0.2 2024-08-13 11:07:57,600 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 11:08:10,303 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-13 11:08:12,576 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.440e+01 2.652e+01 2.992e+01 8.819e+01, threshold=5.304e+01, percent-clipped=1.0 2024-08-13 11:08:17,225 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10550, loss[loss=0.1059, beats_loss=0.01009, ecapa_loss=0.0001706, whisper_loss=0.09415, over 21703.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0108, ecapa_loss=0.0001655, whisper_loss=0.09076, over 3894085.99 frames. ], batch size: 86, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:08:33,811 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 11:08:35,603 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.38 vs. limit=12.0 2024-08-13 11:08:54,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2134630.0, ans=0.2 2024-08-13 11:09:01,488 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 11:09:09,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2134730.0, ans=0.125 2024-08-13 11:09:18,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2134730.0, ans=0.0 2024-08-13 11:09:23,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2134830.0, ans=0.0 2024-08-13 11:09:24,925 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-13 11:09:36,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2134830.0, ans=0.0 2024-08-13 11:09:38,585 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10600, loss[loss=0.07399, beats_loss=0.01247, ecapa_loss=0.00018, whisper_loss=0.05972, over 18102.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0108, ecapa_loss=0.0001662, whisper_loss=0.0907, over 3885126.60 frames. ], batch size: 76, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:09:43,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2134930.0, ans=0.2 2024-08-13 11:09:44,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2134930.0, ans=0.1 2024-08-13 11:09:52,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2135030.0, ans=0.125 2024-08-13 11:09:59,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2135030.0, ans=0.125 2024-08-13 11:10:27,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2135230.0, ans=0.125 2024-08-13 11:10:41,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2135330.0, ans=0.1 2024-08-13 11:10:46,000 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2024-08-13 11:10:52,264 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.455e+01 2.918e+01 3.137e+01 4.464e+01, threshold=5.836e+01, percent-clipped=0.0 2024-08-13 11:10:52,685 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.673e-02 2024-08-13 11:10:54,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2135330.0, ans=10.0 2024-08-13 11:10:57,072 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10650, loss[loss=0.1075, beats_loss=0.01145, ecapa_loss=0.0001538, whisper_loss=0.09455, over 22334.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01076, ecapa_loss=0.0001657, whisper_loss=0.09086, over 3875738.88 frames. ], batch size: 87, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:11:03,420 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 11 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 11:11:05,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2135430.0, ans=0.125 2024-08-13 11:11:06,750 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 11:11:10,582 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 21 from Vox, 50 fro AS 2024-08-13 11:11:19,879 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 11:11:29,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2135630.0, ans=0.125 2024-08-13 11:11:35,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2135630.0, ans=0.125 2024-08-13 11:11:52,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2135730.0, ans=0.125 2024-08-13 11:12:01,938 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 28 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-13 11:12:03,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2135830.0, ans=0.0 2024-08-13 11:12:03,816 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2024-08-13 11:12:15,123 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10700, loss[loss=0.1347, beats_loss=0.009036, ecapa_loss=0.0001809, whisper_loss=0.1239, over 23231.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01075, ecapa_loss=0.000165, whisper_loss=0.09168, over 3880407.48 frames. ], batch size: 92, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:12:21,924 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-13 11:12:51,403 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.73 vs. limit=22.5 2024-08-13 11:13:00,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2136230.0, ans=0.1 2024-08-13 11:13:03,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2136230.0, ans=0.2 2024-08-13 11:13:03,554 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.07 vs. limit=15.0 2024-08-13 11:13:15,356 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 11:13:26,394 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.457e+01 2.823e+01 3.286e+01 3.691e+02, threshold=5.645e+01, percent-clipped=1.0 2024-08-13 11:13:31,240 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10750, loss[loss=0.1011, beats_loss=0.009478, ecapa_loss=0.0001689, whisper_loss=0.08992, over 17908.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01071, ecapa_loss=0.0001669, whisper_loss=0.09201, over 3882231.48 frames. ], batch size: 69, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:13:50,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2136530.0, ans=0.125 2024-08-13 11:13:50,815 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.686e-01 2024-08-13 11:13:53,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2136530.0, ans=0.2 2024-08-13 11:14:25,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2136730.0, ans=0.2 2024-08-13 11:14:25,436 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2024-08-13 11:14:32,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2136830.0, ans=0.125 2024-08-13 11:14:32,492 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2024-08-13 11:14:41,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2136830.0, ans=0.125 2024-08-13 11:14:47,095 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10800, loss[loss=0.09536, beats_loss=0.01263, ecapa_loss=0.0001615, whisper_loss=0.08112, over 20944.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01087, ecapa_loss=0.0001668, whisper_loss=0.09148, over 3896309.11 frames. ], batch size: 85, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:14:51,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2136930.0, ans=0.125 2024-08-13 11:14:54,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2136930.0, ans=0.1 2024-08-13 11:14:55,527 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=15.0 2024-08-13 11:14:56,014 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 12 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-13 11:15:00,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2137030.0, ans=0.125 2024-08-13 11:15:36,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2137230.0, ans=0.2 2024-08-13 11:15:45,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2137330.0, ans=0.125 2024-08-13 11:15:54,338 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 11:15:56,592 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.544e+01 2.753e+01 3.369e+01 1.648e+02, threshold=5.506e+01, percent-clipped=4.0 2024-08-13 11:15:57,357 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 11:16:00,897 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10850, loss[loss=0.114, beats_loss=0.009538, ecapa_loss=0.0001568, whisper_loss=0.1029, over 19916.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01085, ecapa_loss=0.0001669, whisper_loss=0.09128, over 3884877.46 frames. ], batch size: 82, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:16:12,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2137430.0, ans=0.125 2024-08-13 11:16:19,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2137530.0, ans=0.0 2024-08-13 11:16:22,737 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-13 11:17:16,510 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10900, loss[loss=0.1028, beats_loss=0.01092, ecapa_loss=0.000168, whisper_loss=0.09019, over 22154.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01087, ecapa_loss=0.000165, whisper_loss=0.09094, over 3885620.41 frames. ], batch size: 89, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:17:18,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2137930.0, ans=0.1 2024-08-13 11:17:24,700 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 11:17:29,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2137930.0, ans=0.1 2024-08-13 11:17:43,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2138030.0, ans=0.125 2024-08-13 11:17:47,014 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.69 vs. limit=10.0 2024-08-13 11:17:56,788 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.796e+01 2024-08-13 11:18:05,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2138230.0, ans=0.0 2024-08-13 11:18:19,672 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 11:18:26,199 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.488e+01 2.800e+01 3.283e+01 5.415e+01, threshold=5.600e+01, percent-clipped=0.0 2024-08-13 11:18:30,684 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 10950, loss[loss=0.09088, beats_loss=0.01048, ecapa_loss=0.0001923, whisper_loss=0.07847, over 21899.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01079, ecapa_loss=0.0001655, whisper_loss=0.09141, over 3905957.49 frames. ], batch size: 91, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:18:38,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2138430.0, ans=0.125 2024-08-13 11:18:39,751 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2024-08-13 11:18:42,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2138430.0, ans=0.125 2024-08-13 11:19:18,023 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 11:19:24,382 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2024-08-13 11:19:27,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2138730.0, ans=0.0 2024-08-13 11:19:27,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2138730.0, ans=0.0 2024-08-13 11:19:37,755 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-13 11:19:48,397 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11000, loss[loss=0.1294, beats_loss=0.01011, ecapa_loss=0.0001301, whisper_loss=0.118, over 20381.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0107, ecapa_loss=0.0001672, whisper_loss=0.09218, over 3904074.40 frames. ], batch size: 75, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:19:50,551 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.80 vs. limit=22.5 2024-08-13 11:19:51,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2138930.0, ans=0.125 2024-08-13 11:20:00,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2138930.0, ans=0.0 2024-08-13 11:20:05,156 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.62 vs. limit=10.0 2024-08-13 11:20:10,745 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 32 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 11:20:16,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2139130.0, ans=0.125 2024-08-13 11:20:18,292 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=8.107e-02 2024-08-13 11:20:21,022 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 11:20:29,270 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2024-08-13 11:20:32,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=2139230.0, ans=15.0 2024-08-13 11:20:33,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2139230.0, ans=0.0 2024-08-13 11:20:34,613 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 11:20:36,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2139230.0, ans=0.1 2024-08-13 11:20:42,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2139230.0, ans=6.0 2024-08-13 11:20:44,027 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-13 11:20:49,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2139330.0, ans=0.0 2024-08-13 11:20:54,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2139330.0, ans=10.0 2024-08-13 11:20:54,786 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.76 vs. limit=22.5 2024-08-13 11:20:58,078 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.481e+01 2.729e+01 3.286e+01 1.330e+02, threshold=5.458e+01, percent-clipped=4.0 2024-08-13 11:21:03,172 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11050, loss[loss=0.1071, beats_loss=0.0117, ecapa_loss=0.0001571, whisper_loss=0.09379, over 23335.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01077, ecapa_loss=0.000166, whisper_loss=0.09222, over 3943809.71 frames. ], batch size: 89, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:21:10,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2139430.0, ans=0.1 2024-08-13 11:21:22,072 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 11:21:33,910 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.60 vs. limit=22.5 2024-08-13 11:21:54,273 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 11:22:24,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2139830.0, ans=0.125 2024-08-13 11:22:38,445 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11100, loss[loss=0.1262, beats_loss=0.009672, ecapa_loss=0.000185, whisper_loss=0.1147, over 15310.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0107, ecapa_loss=0.0001656, whisper_loss=0.09272, over 3946951.98 frames. ], batch size: 59, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:22:46,682 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 11:22:49,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2139930.0, ans=0.125 2024-08-13 11:22:50,997 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 11:23:00,413 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-13 11:23:07,278 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.673e-01 2024-08-13 11:23:20,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2140130.0, ans=0.125 2024-08-13 11:23:20,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2140130.0, ans=0.1 2024-08-13 11:24:11,127 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.487e+01 2.717e+01 3.069e+01 5.884e+01, threshold=5.434e+01, percent-clipped=1.0 2024-08-13 11:24:16,400 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11150, loss[loss=0.09608, beats_loss=0.009075, ecapa_loss=0.0001473, whisper_loss=0.08554, over 16346.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01064, ecapa_loss=0.0001653, whisper_loss=0.09244, over 3906285.41 frames. ], batch size: 63, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:24:19,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2140430.0, ans=0.1 2024-08-13 11:24:33,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2140530.0, ans=0.0 2024-08-13 11:24:46,077 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.27 vs. limit=10.0 2024-08-13 11:25:02,105 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 11:25:15,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2140830.0, ans=0.0 2024-08-13 11:25:17,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2140830.0, ans=0.125 2024-08-13 11:25:25,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2140830.0, ans=0.125 2024-08-13 11:25:30,060 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11200, loss[loss=0.09531, beats_loss=0.0107, ecapa_loss=0.0002104, whisper_loss=0.08251, over 21695.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01065, ecapa_loss=0.0001648, whisper_loss=0.09182, over 3880328.70 frames. ], batch size: 93, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:25:46,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2141030.0, ans=0.125 2024-08-13 11:25:50,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2141030.0, ans=0.125 2024-08-13 11:25:55,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2141030.0, ans=0.0 2024-08-13 11:26:02,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2141130.0, ans=0.125 2024-08-13 11:26:02,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2141130.0, ans=0.2 2024-08-13 11:26:03,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2141130.0, ans=0.0 2024-08-13 11:26:39,129 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.420e+01 2.628e+01 2.915e+01 3.904e+01, threshold=5.256e+01, percent-clipped=0.0 2024-08-13 11:26:43,804 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11250, loss[loss=0.09012, beats_loss=0.009885, ecapa_loss=0.0001339, whisper_loss=0.0789, over 19493.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01063, ecapa_loss=0.0001655, whisper_loss=0.09227, over 3900789.43 frames. ], batch size: 71, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:26:49,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2141430.0, ans=0.1 2024-08-13 11:26:53,683 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 18 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 11:27:03,460 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 11:27:04,074 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.37 vs. limit=22.5 2024-08-13 11:27:34,673 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 11:27:43,595 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-13 11:27:57,777 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11300, loss[loss=0.1271, beats_loss=0.006876, ecapa_loss=0.0001905, whisper_loss=0.1184, over 19229.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0107, ecapa_loss=0.0001644, whisper_loss=0.0915, over 3890356.85 frames. ], batch size: 74, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:28:03,089 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=12.0 2024-08-13 11:28:14,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2142030.0, ans=0.0 2024-08-13 11:28:24,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2142030.0, ans=0.125 2024-08-13 11:28:39,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2142130.0, ans=0.125 2024-08-13 11:28:47,141 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-13 11:28:51,435 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 11:28:57,049 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-13 11:28:57,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2142330.0, ans=0.0 2024-08-13 11:29:04,511 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 11:29:07,006 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.518e+01 2.742e+01 3.086e+01 4.928e+01, threshold=5.483e+01, percent-clipped=0.0 2024-08-13 11:29:11,380 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11350, loss[loss=0.1303, beats_loss=0.007859, ecapa_loss=0.0001818, whisper_loss=0.1207, over 24059.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0106, ecapa_loss=0.0001644, whisper_loss=0.09183, over 3897032.06 frames. ], batch size: 92, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:29:14,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2142430.0, ans=0.1 2024-08-13 11:29:22,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2142430.0, ans=0.0 2024-08-13 11:29:23,616 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-13 11:29:24,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2142430.0, ans=0.125 2024-08-13 11:29:26,409 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 17 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 11:29:32,552 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 10 from Vox, 39 fro AS 2024-08-13 11:29:42,915 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 33 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-13 11:29:46,026 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 11:30:01,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2142730.0, ans=0.125 2024-08-13 11:30:07,288 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 12 from Vox, 45 fro AS 2024-08-13 11:30:11,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=2142830.0, ans=15.0 2024-08-13 11:30:25,351 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11400, loss[loss=0.1054, beats_loss=0.008855, ecapa_loss=0.0001546, whisper_loss=0.09504, over 18311.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0107, ecapa_loss=0.0001628, whisper_loss=0.09177, over 3887042.39 frames. ], batch size: 71, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:30:30,460 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 11:30:35,459 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2024-08-13 11:30:51,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2143030.0, ans=0.2 2024-08-13 11:30:56,271 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.59 vs. limit=15.0 2024-08-13 11:31:09,233 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 11:31:13,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2143230.0, ans=0.125 2024-08-13 11:31:17,552 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 11:31:19,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2143230.0, ans=0.0 2024-08-13 11:31:24,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2143330.0, ans=0.1 2024-08-13 11:31:36,997 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.547e+01 2.847e+01 3.262e+01 4.632e+01, threshold=5.695e+01, percent-clipped=0.0 2024-08-13 11:31:39,559 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 36 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 11:31:42,388 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11450, loss[loss=0.1006, beats_loss=0.009205, ecapa_loss=0.000173, whisper_loss=0.08967, over 16746.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01067, ecapa_loss=0.0001641, whisper_loss=0.092, over 3871680.08 frames. ], batch size: 66, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:32:11,529 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 11:32:11,976 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.84 vs. limit=22.5 2024-08-13 11:32:51,441 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-13 11:32:53,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=2143830.0, ans=0.02 2024-08-13 11:32:53,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2143830.0, ans=0.125 2024-08-13 11:33:00,288 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11500, loss[loss=0.09699, beats_loss=0.01196, ecapa_loss=0.0001443, whisper_loss=0.08359, over 20137.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01076, ecapa_loss=0.0001634, whisper_loss=0.09207, over 3885041.12 frames. ], batch size: 81, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:33:05,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2143930.0, ans=0.0 2024-08-13 11:33:06,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2143930.0, ans=0.2 2024-08-13 11:33:12,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2143930.0, ans=0.125 2024-08-13 11:33:12,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2143930.0, ans=0.025 2024-08-13 11:33:25,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2144030.0, ans=0.2 2024-08-13 11:33:38,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2144130.0, ans=0.125 2024-08-13 11:33:45,792 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-13 11:33:49,993 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 11:33:51,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2144230.0, ans=0.0 2024-08-13 11:33:57,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2144230.0, ans=0.0 2024-08-13 11:34:08,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2144330.0, ans=0.125 2024-08-13 11:34:10,133 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.467e+01 2.720e+01 3.175e+01 4.456e+01, threshold=5.439e+01, percent-clipped=0.0 2024-08-13 11:34:14,708 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11550, loss[loss=0.1232, beats_loss=0.008889, ecapa_loss=0.0001663, whisper_loss=0.1127, over 15869.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01075, ecapa_loss=0.0001623, whisper_loss=0.09145, over 3863463.59 frames. ], batch size: 59, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:34:17,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2144430.0, ans=0.0 2024-08-13 11:34:26,698 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.20 vs. limit=8.0 2024-08-13 11:34:29,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2144530.0, ans=0.0 2024-08-13 11:34:35,839 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.968e-02 2024-08-13 11:34:41,306 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.565e-01 2024-08-13 11:34:59,533 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-13 11:35:00,388 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 11:35:16,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2144830.0, ans=0.125 2024-08-13 11:35:21,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2144830.0, ans=0.125 2024-08-13 11:35:29,251 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11600, loss[loss=0.08578, beats_loss=0.01205, ecapa_loss=0.0001692, whisper_loss=0.07204, over 22085.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0108, ecapa_loss=0.0001622, whisper_loss=0.09075, over 3865238.57 frames. ], batch size: 90, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:35:51,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2145030.0, ans=0.125 2024-08-13 11:36:07,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2145130.0, ans=0.125 2024-08-13 11:36:15,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2145230.0, ans=0.2 2024-08-13 11:36:21,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2145230.0, ans=0.2 2024-08-13 11:36:37,795 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.430e+01 2.771e+01 3.076e+01 5.105e+01, threshold=5.541e+01, percent-clipped=0.0 2024-08-13 11:36:41,857 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11650, loss[loss=0.08866, beats_loss=0.01231, ecapa_loss=0.0001497, whisper_loss=0.07484, over 17517.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01088, ecapa_loss=0.0001641, whisper_loss=0.09005, over 3868620.02 frames. ], batch size: 70, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:36:43,856 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 11:36:49,884 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 11:36:51,586 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 36 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 11:36:57,407 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 11:37:03,879 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 37 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 11:37:06,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2145530.0, ans=0.0 2024-08-13 11:37:09,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2145530.0, ans=0.1 2024-08-13 11:37:14,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2145630.0, ans=0.0 2024-08-13 11:37:20,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2145630.0, ans=10.0 2024-08-13 11:37:24,699 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-13 11:37:29,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2145730.0, ans=0.0 2024-08-13 11:37:53,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2145830.0, ans=0.125 2024-08-13 11:37:56,255 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 32 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 11:37:57,295 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11700, loss[loss=0.1334, beats_loss=0.008895, ecapa_loss=0.0001791, whisper_loss=0.1227, over 20110.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0109, ecapa_loss=0.0001649, whisper_loss=0.09004, over 3884572.70 frames. ], batch size: 78, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:37:59,711 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-13 11:38:02,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2145930.0, ans=0.125 2024-08-13 11:38:14,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2146030.0, ans=0.0 2024-08-13 11:38:20,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2146030.0, ans=0.125 2024-08-13 11:38:38,983 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 30 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 11:38:45,839 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-13 11:38:47,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2146230.0, ans=0.2 2024-08-13 11:38:58,926 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=12.0 2024-08-13 11:39:07,588 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.516e+01 2.793e+01 3.243e+01 6.496e+01, threshold=5.587e+01, percent-clipped=2.0 2024-08-13 11:39:11,882 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11750, loss[loss=0.1047, beats_loss=0.01127, ecapa_loss=0.0001512, whisper_loss=0.09195, over 17607.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01084, ecapa_loss=0.0001641, whisper_loss=0.0912, over 3910927.32 frames. ], batch size: 71, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:39:27,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2146530.0, ans=0.0 2024-08-13 11:39:28,176 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2024-08-13 11:39:37,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2146530.0, ans=0.125 2024-08-13 11:39:38,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2146530.0, ans=0.125 2024-08-13 11:39:43,093 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 11:39:51,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2146630.0, ans=0.0 2024-08-13 11:40:01,778 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.43 vs. limit=15.0 2024-08-13 11:40:08,289 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 11:40:11,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2146830.0, ans=0.0 2024-08-13 11:40:12,428 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 11:40:14,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2146830.0, ans=0.125 2024-08-13 11:40:16,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2146830.0, ans=0.125 2024-08-13 11:40:23,406 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11800, loss[loss=0.09329, beats_loss=0.01253, ecapa_loss=0.0001506, whisper_loss=0.07926, over 20123.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01086, ecapa_loss=0.0001641, whisper_loss=0.09135, over 3916695.82 frames. ], batch size: 81, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:40:39,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2147030.0, ans=0.125 2024-08-13 11:40:54,176 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-13 11:41:02,638 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 11:41:29,485 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.422e+01 2.679e+01 2.998e+01 8.058e+01, threshold=5.358e+01, percent-clipped=1.0 2024-08-13 11:41:29,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2147330.0, ans=0.0 2024-08-13 11:41:33,444 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11850, loss[loss=0.1323, beats_loss=0.008406, ecapa_loss=0.0001809, whisper_loss=0.1221, over 15342.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0108, ecapa_loss=0.0001643, whisper_loss=0.09187, over 3913452.57 frames. ], batch size: 58, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:41:54,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2147530.0, ans=0.1 2024-08-13 11:41:56,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2147530.0, ans=10.0 2024-08-13 11:41:57,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2147530.0, ans=0.125 2024-08-13 11:42:07,091 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-13 11:42:22,342 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 11:42:31,305 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2024-08-13 11:42:34,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2147830.0, ans=0.125 2024-08-13 11:42:42,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11900, loss[loss=0.102, beats_loss=0.01084, ecapa_loss=0.0001521, whisper_loss=0.08969, over 21626.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01083, ecapa_loss=0.0001649, whisper_loss=0.09166, over 3948389.80 frames. ], batch size: 89, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:43:02,914 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 11:43:14,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2148130.0, ans=0.125 2024-08-13 11:43:31,708 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-13 11:43:33,471 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 11:43:35,935 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-13 11:43:42,687 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=12.0 2024-08-13 11:43:47,388 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.341e+01 2.622e+01 2.921e+01 5.658e+01, threshold=5.245e+01, percent-clipped=1.0 2024-08-13 11:43:49,021 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 11:43:51,922 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 11950, loss[loss=0.135, beats_loss=0.007402, ecapa_loss=0.0001728, whisper_loss=0.1258, over 14241.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0108, ecapa_loss=0.0001657, whisper_loss=0.09117, over 3915548.47 frames. ], batch size: 55, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:43:53,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2148430.0, ans=0.125 2024-08-13 11:43:59,053 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.74 vs. limit=15.0 2024-08-13 11:44:00,338 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.12 vs. limit=6.0 2024-08-13 11:44:14,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2148530.0, ans=0.1 2024-08-13 11:44:38,184 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 39 from Vox, 29 fro AS 2024-08-13 11:44:38,672 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.45 vs. limit=15.0 2024-08-13 11:44:44,487 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.564e+05 2024-08-13 11:44:57,272 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12000, loss[loss=0.1194, beats_loss=0.008431, ecapa_loss=0.0001723, whisper_loss=0.1093, over 19757.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01082, ecapa_loss=0.0001656, whisper_loss=0.09084, over 3924379.57 frames. ], batch size: 77, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:44:57,272 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-13 11:45:36,589 INFO [train_multi_KD3.py:1149] (2/4) Epoch 15, validation on ASR_libri: loss=0.2542, beats_loss=0, ecapa_loss=0.0005616, whisper_loss=0.2486, over 922467.00 frames. 2024-08-13 11:45:55,785 INFO [train_multi_KD3.py:1149] (2/4) Epoch 15, validation on SV_voxceleb1: loss=0.004517, beats_loss=0, ecapa_loss=0.0004517, whisper_loss=0, over 939242.00 frames. 2024-08-13 11:46:22,686 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3763, 2.6658, 2.4663, 2.6240], device='cuda:2') 2024-08-13 11:47:56,538 INFO [train_multi_KD3.py:1149] (2/4) Epoch 15, validation on AT_audioset: loss=0.0239, beats_loss=0.0239, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 11:47:56,542 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-13 11:48:14,669 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2024-08-13 11:48:16,774 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 11:48:25,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=2149130.0, ans=0.02 2024-08-13 11:48:36,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2149230.0, ans=0.04949747468305833 2024-08-13 11:48:41,564 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.41 vs. limit=22.5 2024-08-13 11:48:47,673 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-13 11:48:59,217 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.423e+01 2.671e+01 3.267e+01 7.662e+01, threshold=5.342e+01, percent-clipped=3.0 2024-08-13 11:49:03,397 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12050, loss[loss=0.09278, beats_loss=0.01197, ecapa_loss=0.0001532, whisper_loss=0.07928, over 18992.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01082, ecapa_loss=0.0001661, whisper_loss=0.09057, over 3880927.75 frames. ], batch size: 74, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:49:08,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2149430.0, ans=0.0 2024-08-13 11:49:10,915 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2024-08-13 11:49:13,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2149430.0, ans=0.035 2024-08-13 11:49:19,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2149530.0, ans=0.05 2024-08-13 11:49:21,329 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2024-08-13 11:49:24,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2149530.0, ans=0.125 2024-08-13 11:49:27,516 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2024-08-13 11:49:33,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2149630.0, ans=0.125 2024-08-13 11:49:41,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2149730.0, ans=0.1 2024-08-13 11:49:42,419 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-13 11:49:48,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2149730.0, ans=0.125 2024-08-13 11:49:56,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2149830.0, ans=0.125 2024-08-13 11:50:03,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2149830.0, ans=0.125 2024-08-13 11:50:07,981 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12100, loss[loss=0.09422, beats_loss=0.01139, ecapa_loss=0.0001794, whisper_loss=0.08103, over 20458.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.0001669, whisper_loss=0.09093, over 3868954.96 frames. ], batch size: 87, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:50:33,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2150130.0, ans=0.125 2024-08-13 11:50:56,354 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=15.0 2024-08-13 11:50:58,993 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.72 vs. limit=22.5 2024-08-13 11:51:08,319 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2024-08-13 11:51:08,779 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.393e+01 2.671e+01 2.986e+01 4.532e+01, threshold=5.343e+01, percent-clipped=0.0 2024-08-13 11:51:09,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2150330.0, ans=0.0 2024-08-13 11:51:12,763 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12150, loss[loss=0.0833, beats_loss=0.01382, ecapa_loss=0.0001463, whisper_loss=0.06801, over 13771.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01075, ecapa_loss=0.0001667, whisper_loss=0.09111, over 3846648.35 frames. ], batch size: 54, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:51:13,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2150430.0, ans=0.07 2024-08-13 11:51:22,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2150430.0, ans=0.125 2024-08-13 11:51:28,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2150530.0, ans=0.125 2024-08-13 11:51:36,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2150530.0, ans=0.125 2024-08-13 11:51:42,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2150630.0, ans=0.0 2024-08-13 11:51:48,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2150630.0, ans=0.125 2024-08-13 11:51:57,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2150730.0, ans=0.125 2024-08-13 11:52:03,817 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 11:52:05,183 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-13 11:52:19,378 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12200, loss[loss=0.08873, beats_loss=0.01316, ecapa_loss=0.0001321, whisper_loss=0.07425, over 17962.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01081, ecapa_loss=0.000166, whisper_loss=0.09071, over 3832893.27 frames. ], batch size: 70, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:52:19,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2150930.0, ans=0.125 2024-08-13 11:52:21,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.37 vs. limit=22.5 2024-08-13 11:52:29,037 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 11:52:32,720 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 38 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 11:52:35,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2151030.0, ans=0.0 2024-08-13 11:52:43,072 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 11:52:43,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2151030.0, ans=0.0 2024-08-13 11:52:48,759 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-13 11:52:56,271 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-13 11:53:09,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2151230.0, ans=0.125 2024-08-13 11:53:16,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2151330.0, ans=0.125 2024-08-13 11:53:17,417 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 11:53:21,089 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.490e+01 2.780e+01 3.147e+01 4.927e+01, threshold=5.560e+01, percent-clipped=0.0 2024-08-13 11:53:23,924 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-13 11:53:25,087 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12250, loss[loss=0.08735, beats_loss=0.01174, ecapa_loss=0.0001683, whisper_loss=0.07392, over 22552.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01072, ecapa_loss=0.0001653, whisper_loss=0.09189, over 3866933.78 frames. ], batch size: 93, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:53:28,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2151430.0, ans=0.07 2024-08-13 11:53:29,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2151430.0, ans=0.125 2024-08-13 11:53:30,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2151430.0, ans=0.1 2024-08-13 11:53:40,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2151530.0, ans=0.125 2024-08-13 11:53:54,536 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2024-08-13 11:54:03,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2151730.0, ans=0.2 2024-08-13 11:54:04,467 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 11:54:05,623 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 11:54:06,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2151730.0, ans=0.0 2024-08-13 11:54:18,726 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 11:54:20,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2151830.0, ans=0.125 2024-08-13 11:54:30,915 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12300, loss[loss=0.08222, beats_loss=0.01005, ecapa_loss=0.0002042, whisper_loss=0.07012, over 20937.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01087, ecapa_loss=0.0001654, whisper_loss=0.0904, over 3870930.55 frames. ], batch size: 90, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:54:45,519 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 11:54:47,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2152030.0, ans=0.125 2024-08-13 11:54:53,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2152030.0, ans=0.125 2024-08-13 11:54:55,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2152030.0, ans=0.125 2024-08-13 11:54:57,897 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.64 vs. limit=15.0 2024-08-13 11:55:06,481 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 11:55:06,972 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.37 vs. limit=10.0 2024-08-13 11:55:09,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2152230.0, ans=0.125 2024-08-13 11:55:19,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2152230.0, ans=0.125 2024-08-13 11:55:23,644 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-13 11:55:32,369 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.395e+01 2.675e+01 2.989e+01 4.697e+01, threshold=5.351e+01, percent-clipped=0.0 2024-08-13 11:55:32,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2152330.0, ans=0.125 2024-08-13 11:55:33,844 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 32 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 11:55:36,423 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12350, loss[loss=0.1128, beats_loss=0.01021, ecapa_loss=0.0001595, whisper_loss=0.101, over 22421.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01088, ecapa_loss=0.0001656, whisper_loss=0.09075, over 3911232.35 frames. ], batch size: 88, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:55:37,825 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 27 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 11:56:02,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2152630.0, ans=0.0 2024-08-13 11:56:04,728 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2024-08-13 11:56:06,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2152630.0, ans=0.125 2024-08-13 11:56:14,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2152730.0, ans=0.125 2024-08-13 11:56:16,595 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 11:56:18,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2152730.0, ans=0.125 2024-08-13 11:56:25,925 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-13 11:56:29,483 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 11:56:31,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2152830.0, ans=0.05 2024-08-13 11:56:41,001 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12400, loss[loss=0.1089, beats_loss=0.009981, ecapa_loss=0.0001724, whisper_loss=0.09715, over 17268.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01089, ecapa_loss=0.0001657, whisper_loss=0.09113, over 3932413.89 frames. ], batch size: 68, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:56:47,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2152930.0, ans=0.125 2024-08-13 11:56:52,569 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 11:56:52,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2153030.0, ans=0.125 2024-08-13 11:56:56,566 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 11:57:07,281 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 36 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 11:57:07,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2153130.0, ans=0.125 2024-08-13 11:57:13,604 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-13 11:57:20,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2153230.0, ans=0.2 2024-08-13 11:57:23,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2153230.0, ans=0.5 2024-08-13 11:57:26,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2153230.0, ans=0.125 2024-08-13 11:57:37,838 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 11:57:40,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2153330.0, ans=0.125 2024-08-13 11:57:43,060 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.413e+01 2.568e+01 2.884e+01 5.690e+01, threshold=5.135e+01, percent-clipped=1.0 2024-08-13 11:57:47,355 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12450, loss[loss=0.0873, beats_loss=0.01444, ecapa_loss=0.0001384, whisper_loss=0.07148, over 20388.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01082, ecapa_loss=0.0001667, whisper_loss=0.09137, over 3918216.64 frames. ], batch size: 84, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:57:50,371 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.369e-01 2024-08-13 11:57:57,674 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 11:58:06,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2153530.0, ans=0.125 2024-08-13 11:58:24,174 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 11:58:25,581 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-13 11:58:42,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2153830.0, ans=0.125 2024-08-13 11:58:48,230 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 11:58:50,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2153830.0, ans=0.125 2024-08-13 11:58:53,011 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12500, loss[loss=0.09201, beats_loss=0.01008, ecapa_loss=0.0001476, whisper_loss=0.08045, over 17269.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01078, ecapa_loss=0.0001658, whisper_loss=0.09127, over 3933170.66 frames. ], batch size: 68, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:59:05,117 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-13 11:59:08,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2154030.0, ans=0.0 2024-08-13 11:59:14,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2154030.0, ans=0.125 2024-08-13 11:59:21,589 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.12 vs. limit=10.0 2024-08-13 11:59:22,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2154130.0, ans=0.2 2024-08-13 11:59:39,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2154230.0, ans=0.125 2024-08-13 11:59:53,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2154330.0, ans=0.125 2024-08-13 11:59:54,666 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.421e+01 2.658e+01 2.977e+01 4.803e+01, threshold=5.316e+01, percent-clipped=0.0 2024-08-13 11:59:55,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2154330.0, ans=0.125 2024-08-13 11:59:58,482 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12550, loss[loss=0.09764, beats_loss=0.01195, ecapa_loss=0.0001628, whisper_loss=0.08406, over 15877.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01078, ecapa_loss=0.0001658, whisper_loss=0.09162, over 3913088.23 frames. ], batch size: 64, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 12:00:07,593 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.22 vs. limit=10.0 2024-08-13 12:00:15,048 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.55 vs. limit=15.0 2024-08-13 12:00:19,728 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 12:00:20,970 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-13 12:00:21,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2154530.0, ans=0.125 2024-08-13 12:00:49,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2154730.0, ans=0.95 2024-08-13 12:00:50,489 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-13 12:01:02,830 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2024-08-13 12:01:04,870 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12600, loss[loss=0.09502, beats_loss=0.0111, ecapa_loss=0.0001587, whisper_loss=0.08233, over 16467.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0108, ecapa_loss=0.0001668, whisper_loss=0.09133, over 3876776.47 frames. ], batch size: 66, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 12:01:06,010 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.06 vs. limit=6.0 2024-08-13 12:01:06,526 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 12:01:10,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-13 12:01:16,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2155030.0, ans=0.125 2024-08-13 12:01:23,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2155030.0, ans=0.2 2024-08-13 12:01:28,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2155030.0, ans=0.125 2024-08-13 12:01:43,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2155230.0, ans=0.2 2024-08-13 12:01:44,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2155230.0, ans=0.125 2024-08-13 12:01:49,783 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 12:02:06,739 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.360e+01 2.643e+01 2.873e+01 1.126e+02, threshold=5.286e+01, percent-clipped=2.0 2024-08-13 12:02:08,186 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 12:02:10,477 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12650, loss[loss=0.09236, beats_loss=0.0125, ecapa_loss=0.0001608, whisper_loss=0.07826, over 14440.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01083, ecapa_loss=0.000168, whisper_loss=0.09206, over 3885587.00 frames. ], batch size: 60, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 12:02:24,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2155530.0, ans=0.125 2024-08-13 12:02:37,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.39 vs. limit=22.5 2024-08-13 12:02:46,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2155630.0, ans=0.1 2024-08-13 12:02:52,867 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.65 vs. limit=15.0 2024-08-13 12:02:56,646 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=12.0 2024-08-13 12:03:14,943 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12700, loss[loss=0.09849, beats_loss=0.01247, ecapa_loss=0.0001746, whisper_loss=0.08427, over 22616.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01082, ecapa_loss=0.0001668, whisper_loss=0.09254, over 3927032.22 frames. ], batch size: 93, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 12:03:16,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2155930.0, ans=0.0 2024-08-13 12:03:20,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=2155930.0, ans=10.0 2024-08-13 12:03:43,503 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2024-08-13 12:03:45,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2156130.0, ans=0.125 2024-08-13 12:03:47,294 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.02 vs. limit=6.0 2024-08-13 12:03:58,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2156230.0, ans=0.125 2024-08-13 12:04:17,453 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.422e+01 2.694e+01 3.051e+01 5.714e+01, threshold=5.388e+01, percent-clipped=1.0 2024-08-13 12:04:19,927 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12750, loss[loss=0.1239, beats_loss=0.009666, ecapa_loss=0.0001547, whisper_loss=0.1127, over 14275.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01089, ecapa_loss=0.0001659, whisper_loss=0.09254, over 3931010.86 frames. ], batch size: 57, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:04:24,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2156430.0, ans=0.0 2024-08-13 12:04:41,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2156530.0, ans=0.04949747468305833 2024-08-13 12:04:46,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2156630.0, ans=0.125 2024-08-13 12:04:53,963 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 12:04:56,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=2156630.0, ans=15.0 2024-08-13 12:05:07,731 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 12:05:24,572 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 12:05:24,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2156830.0, ans=0.125 2024-08-13 12:05:24,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2156830.0, ans=0.05 2024-08-13 12:05:25,992 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-13 12:05:27,114 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12800, loss[loss=0.09271, beats_loss=0.01289, ecapa_loss=0.0001506, whisper_loss=0.07831, over 23077.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01095, ecapa_loss=0.0001657, whisper_loss=0.09163, over 3938550.48 frames. ], batch size: 94, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:05:30,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2156930.0, ans=0.125 2024-08-13 12:05:39,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2157030.0, ans=0.2 2024-08-13 12:05:40,549 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 12:05:43,540 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 12:05:47,645 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 12:05:57,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2157130.0, ans=0.0 2024-08-13 12:05:58,949 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 12:06:23,111 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 12:06:33,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2157330.0, ans=0.0 2024-08-13 12:06:34,415 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.329e+01 2.579e+01 3.123e+01 7.384e+01, threshold=5.159e+01, percent-clipped=1.0 2024-08-13 12:06:37,260 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12850, loss[loss=0.08839, beats_loss=0.01268, ecapa_loss=0.0001353, whisper_loss=0.07436, over 16885.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01095, ecapa_loss=0.0001662, whisper_loss=0.09166, over 3908344.82 frames. ], batch size: 66, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:07:07,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2157630.0, ans=0.125 2024-08-13 12:07:14,380 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 12:07:17,038 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 12:07:42,721 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=12.0 2024-08-13 12:07:49,833 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12900, loss[loss=0.08513, beats_loss=0.01331, ecapa_loss=0.000159, whisper_loss=0.07023, over 22750.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01091, ecapa_loss=0.0001665, whisper_loss=0.09164, over 3884325.59 frames. ], batch size: 94, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:07:51,781 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 14 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 12:08:19,067 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.59 vs. limit=22.5 2024-08-13 12:08:52,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2158330.0, ans=0.0 2024-08-13 12:08:57,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2158330.0, ans=0.07 2024-08-13 12:08:57,292 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 12:09:01,813 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.397e+01 2.771e+01 3.216e+01 4.644e+01, threshold=5.541e+01, percent-clipped=0.0 2024-08-13 12:09:05,075 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 12950, loss[loss=0.1098, beats_loss=0.0116, ecapa_loss=0.0001257, whisper_loss=0.09693, over 20161.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01085, ecapa_loss=0.0001667, whisper_loss=0.09169, over 3882910.43 frames. ], batch size: 77, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:09:21,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2158530.0, ans=0.0 2024-08-13 12:09:34,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=2158630.0, ans=0.2 2024-08-13 12:09:45,014 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 18 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 12:10:18,305 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 12:10:19,328 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13000, loss[loss=0.1056, beats_loss=0.01037, ecapa_loss=0.0001712, whisper_loss=0.09347, over 19051.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01081, ecapa_loss=0.0001669, whisper_loss=0.09262, over 3891935.82 frames. ], batch size: 74, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:10:22,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2158930.0, ans=0.125 2024-08-13 12:10:38,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2159030.0, ans=0.07 2024-08-13 12:10:43,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2159030.0, ans=0.1 2024-08-13 12:10:45,350 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.44 vs. limit=15.0 2024-08-13 12:10:56,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2159130.0, ans=0.125 2024-08-13 12:11:02,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2159230.0, ans=0.125 2024-08-13 12:11:29,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2159330.0, ans=0.125 2024-08-13 12:11:30,598 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.699e+01 2.395e+01 2.755e+01 3.311e+01 7.767e+01, threshold=5.510e+01, percent-clipped=1.0 2024-08-13 12:11:30,811 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 38 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-13 12:11:33,545 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13050, loss[loss=0.09639, beats_loss=0.01132, ecapa_loss=0.0001346, whisper_loss=0.08373, over 15052.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01085, ecapa_loss=0.0001657, whisper_loss=0.09238, over 3868594.93 frames. ], batch size: 59, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:11:34,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2159430.0, ans=0.1 2024-08-13 12:11:40,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2159430.0, ans=0.0 2024-08-13 12:11:52,630 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 12:11:55,351 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-13 12:12:19,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2159730.0, ans=0.2 2024-08-13 12:12:24,079 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2024-08-13 12:12:43,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2159830.0, ans=0.125 2024-08-13 12:12:50,429 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13100, loss[loss=0.106, beats_loss=0.008131, ecapa_loss=0.0002159, whisper_loss=0.09575, over 14366.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01079, ecapa_loss=0.0001658, whisper_loss=0.09241, over 3859181.82 frames. ], batch size: 60, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:12:52,492 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 12:13:05,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2159930.0, ans=0.1 2024-08-13 12:13:11,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2160030.0, ans=0.0 2024-08-13 12:13:15,447 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=15.0 2024-08-13 12:13:19,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2160030.0, ans=0.125 2024-08-13 12:13:25,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2160130.0, ans=0.0 2024-08-13 12:13:26,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2160130.0, ans=0.0 2024-08-13 12:14:09,715 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.515e+01 2.737e+01 3.188e+01 6.948e+01, threshold=5.474e+01, percent-clipped=1.0 2024-08-13 12:14:12,768 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13150, loss[loss=0.08989, beats_loss=0.01189, ecapa_loss=0.0001966, whisper_loss=0.07604, over 20533.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01076, ecapa_loss=0.0001658, whisper_loss=0.09229, over 3873279.95 frames. ], batch size: 89, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:14:17,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2160430.0, ans=0.125 2024-08-13 12:14:19,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2160430.0, ans=0.125 2024-08-13 12:14:36,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2160530.0, ans=0.0 2024-08-13 12:14:42,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2160630.0, ans=0.2 2024-08-13 12:14:50,648 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.98 vs. limit=15.0 2024-08-13 12:15:01,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2160730.0, ans=0.0 2024-08-13 12:15:15,695 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2024-08-13 12:15:23,532 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-13 12:15:32,587 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13200, loss[loss=0.0861, beats_loss=0.01491, ecapa_loss=0.0001174, whisper_loss=0.07001, over 19956.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01079, ecapa_loss=0.0001645, whisper_loss=0.09164, over 3860439.60 frames. ], batch size: 79, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:15:38,037 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 12:16:03,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2161130.0, ans=0.0 2024-08-13 12:16:13,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2161130.0, ans=0.125 2024-08-13 12:16:51,182 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.301e+01 2.587e+01 2.900e+01 9.399e+01, threshold=5.174e+01, percent-clipped=1.0 2024-08-13 12:16:54,515 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13250, loss[loss=0.1061, beats_loss=0.009211, ecapa_loss=0.0002034, whisper_loss=0.09487, over 18372.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01071, ecapa_loss=0.0001641, whisper_loss=0.09204, over 3825045.89 frames. ], batch size: 73, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:16:55,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2161430.0, ans=0.0 2024-08-13 12:17:11,032 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 12:17:14,273 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 12:17:25,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2161630.0, ans=0.125 2024-08-13 12:17:27,850 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-13 12:17:46,466 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 12:17:46,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2161730.0, ans=0.125 2024-08-13 12:17:51,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2161730.0, ans=0.1 2024-08-13 12:17:54,005 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-13 12:17:57,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2161830.0, ans=0.04949747468305833 2024-08-13 12:18:12,233 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13300, loss[loss=0.1042, beats_loss=0.01036, ecapa_loss=0.0001981, whisper_loss=0.09189, over 14803.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01065, ecapa_loss=0.0001648, whisper_loss=0.09219, over 3818705.69 frames. ], batch size: 62, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:18:28,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2162030.0, ans=0.125 2024-08-13 12:18:34,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2162030.0, ans=0.125 2024-08-13 12:18:40,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2162030.0, ans=0.0 2024-08-13 12:18:45,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2162130.0, ans=0.0 2024-08-13 12:19:10,573 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-13 12:19:10,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2162230.0, ans=0.2 2024-08-13 12:19:19,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2162330.0, ans=0.125 2024-08-13 12:19:26,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2162330.0, ans=0.125 2024-08-13 12:19:29,749 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.345e+01 2.598e+01 2.972e+01 4.210e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-13 12:19:33,266 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13350, loss[loss=0.1076, beats_loss=0.01052, ecapa_loss=0.000179, whisper_loss=0.0953, over 17074.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01068, ecapa_loss=0.0001635, whisper_loss=0.09234, over 3831761.94 frames. ], batch size: 72, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:19:36,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2162430.0, ans=0.0 2024-08-13 12:19:44,197 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 12:19:47,094 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=15.0 2024-08-13 12:19:51,633 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 12:19:53,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2162530.0, ans=0.0 2024-08-13 12:19:59,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2162530.0, ans=0.125 2024-08-13 12:20:09,439 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 12:20:10,006 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2024-08-13 12:20:20,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2162730.0, ans=0.125 2024-08-13 12:20:21,421 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2024-08-13 12:20:50,007 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13400, loss[loss=0.1166, beats_loss=0.008351, ecapa_loss=0.0002105, whisper_loss=0.1062, over 18356.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01081, ecapa_loss=0.0001639, whisper_loss=0.09208, over 3844758.77 frames. ], batch size: 78, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:21:12,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2163030.0, ans=0.1 2024-08-13 12:21:16,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2163030.0, ans=0.1 2024-08-13 12:21:22,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2163130.0, ans=0.0 2024-08-13 12:21:27,087 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-13 12:21:30,144 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 12:21:36,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=2163230.0, ans=0.02 2024-08-13 12:22:06,270 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.423e+01 2.749e+01 3.162e+01 4.773e+01, threshold=5.498e+01, percent-clipped=0.0 2024-08-13 12:22:06,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2163330.0, ans=0.125 2024-08-13 12:22:08,840 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13450, loss[loss=0.1125, beats_loss=0.009213, ecapa_loss=0.0001934, whisper_loss=0.1014, over 22175.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01075, ecapa_loss=0.0001645, whisper_loss=0.09238, over 3886256.00 frames. ], batch size: 91, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:22:20,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2163430.0, ans=0.125 2024-08-13 12:22:29,794 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-13 12:22:30,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2163530.0, ans=0.2 2024-08-13 12:22:42,261 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 10 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-13 12:22:47,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2163630.0, ans=0.125 2024-08-13 12:23:14,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2163830.0, ans=10.0 2024-08-13 12:23:15,291 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=12.0 2024-08-13 12:23:20,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2163830.0, ans=0.125 2024-08-13 12:23:26,672 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13500, loss[loss=0.1049, beats_loss=0.01132, ecapa_loss=0.0001828, whisper_loss=0.09175, over 21180.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01077, ecapa_loss=0.0001664, whisper_loss=0.09182, over 3884157.11 frames. ], batch size: 91, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:23:33,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2163930.0, ans=0.125 2024-08-13 12:23:54,306 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 12:23:58,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2164130.0, ans=0.125 2024-08-13 12:23:58,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2164130.0, ans=0.125 2024-08-13 12:24:02,223 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=22.5 2024-08-13 12:24:02,739 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 13 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 12:24:30,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2164330.0, ans=0.0 2024-08-13 12:24:38,973 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 12:24:41,400 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.326e+01 2.605e+01 3.115e+01 6.571e+01, threshold=5.210e+01, percent-clipped=1.0 2024-08-13 12:24:45,043 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13550, loss[loss=0.1108, beats_loss=0.008235, ecapa_loss=0.0002059, whisper_loss=0.1005, over 18188.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01083, ecapa_loss=0.0001652, whisper_loss=0.09162, over 3831008.58 frames. ], batch size: 73, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:24:54,949 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-13 12:24:57,845 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-13 12:25:29,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2164730.0, ans=0.0 2024-08-13 12:25:39,144 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2024-08-13 12:25:41,160 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.33 vs. limit=10.0 2024-08-13 12:25:43,559 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 12:25:50,884 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 33 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 12:25:56,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2164830.0, ans=0.1 2024-08-13 12:25:56,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2164830.0, ans=0.125 2024-08-13 12:26:02,016 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13600, loss[loss=0.1046, beats_loss=0.01068, ecapa_loss=0.000159, whisper_loss=0.09238, over 21340.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01083, ecapa_loss=0.0001647, whisper_loss=0.09134, over 3879203.11 frames. ], batch size: 84, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:26:09,096 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 12:26:21,043 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-13 12:26:53,253 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 12:26:56,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2165230.0, ans=0.125 2024-08-13 12:27:02,807 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 12:27:13,495 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-08-13 12:27:17,531 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.537e+01 2.794e+01 3.122e+01 4.623e+01, threshold=5.587e+01, percent-clipped=0.0 2024-08-13 12:27:17,793 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-13 12:27:20,452 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13650, loss[loss=0.1212, beats_loss=0.009523, ecapa_loss=0.0001543, whisper_loss=0.1101, over 18138.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01089, ecapa_loss=0.0001639, whisper_loss=0.09139, over 3855067.85 frames. ], batch size: 68, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:27:21,177 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-13 12:27:22,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2165430.0, ans=0.2 2024-08-13 12:27:31,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2165430.0, ans=0.125 2024-08-13 12:27:33,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2165430.0, ans=0.2 2024-08-13 12:27:43,669 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 12:27:52,990 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 12:27:54,267 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 12:28:12,609 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 12:28:27,023 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 24 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-13 12:28:30,130 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 12:28:33,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2165830.0, ans=0.2 2024-08-13 12:28:38,038 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13700, loss[loss=0.09872, beats_loss=0.01126, ecapa_loss=0.0001791, whisper_loss=0.08567, over 17147.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01087, ecapa_loss=0.000164, whisper_loss=0.09208, over 3872044.56 frames. ], batch size: 72, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:29:13,552 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 12:29:17,904 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-13 12:29:25,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2166230.0, ans=0.125 2024-08-13 12:29:26,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2166230.0, ans=0.1 2024-08-13 12:29:40,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2166330.0, ans=0.1 2024-08-13 12:29:43,223 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-13 12:29:49,828 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=8.009e-02 2024-08-13 12:29:52,348 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.509e+01 2.844e+01 3.319e+01 7.223e+01, threshold=5.689e+01, percent-clipped=1.0 2024-08-13 12:29:55,330 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13750, loss[loss=0.08142, beats_loss=0.009278, ecapa_loss=0.0001899, whisper_loss=0.07024, over 13399.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01073, ecapa_loss=0.0001649, whisper_loss=0.09258, over 3839009.55 frames. ], batch size: 56, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:30:02,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2166430.0, ans=0.125 2024-08-13 12:30:03,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2166430.0, ans=0.0 2024-08-13 12:30:06,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2166430.0, ans=0.5 2024-08-13 12:30:09,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2166530.0, ans=0.1 2024-08-13 12:30:17,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2166530.0, ans=0.125 2024-08-13 12:30:21,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2166530.0, ans=0.125 2024-08-13 12:30:28,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2166630.0, ans=0.0 2024-08-13 12:30:45,995 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-13 12:30:49,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2024-08-13 12:31:00,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2166830.0, ans=0.125 2024-08-13 12:31:03,833 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 12:31:07,098 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 12:31:12,662 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13800, loss[loss=0.07931, beats_loss=0.01443, ecapa_loss=0.0001731, whisper_loss=0.06314, over 19973.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01079, ecapa_loss=0.0001653, whisper_loss=0.09203, over 3869376.50 frames. ], batch size: 86, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:31:17,112 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 12:31:27,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2167030.0, ans=0.05 2024-08-13 12:31:30,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2167030.0, ans=0.125 2024-08-13 12:31:39,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2167030.0, ans=0.125 2024-08-13 12:32:20,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2167330.0, ans=0.035 2024-08-13 12:32:25,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2167330.0, ans=0.0 2024-08-13 12:32:26,652 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.343e+01 2.633e+01 2.825e+01 4.077e+01, threshold=5.266e+01, percent-clipped=0.0 2024-08-13 12:32:30,076 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13850, loss[loss=0.1176, beats_loss=0.01127, ecapa_loss=0.0001299, whisper_loss=0.105, over 23283.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01079, ecapa_loss=0.0001647, whisper_loss=0.09214, over 3891231.08 frames. ], batch size: 88, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:32:30,825 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2024-08-13 12:32:41,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2167430.0, ans=0.125 2024-08-13 12:32:52,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2167530.0, ans=0.0 2024-08-13 12:32:54,587 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 12:32:57,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2167530.0, ans=0.2 2024-08-13 12:33:17,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2167730.0, ans=0.125 2024-08-13 12:33:26,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2167730.0, ans=0.1 2024-08-13 12:33:30,606 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 12:33:33,836 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 12:33:47,695 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13900, loss[loss=0.09531, beats_loss=0.01311, ecapa_loss=0.0001512, whisper_loss=0.08069, over 14957.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01082, ecapa_loss=0.000164, whisper_loss=0.09177, over 3881203.16 frames. ], batch size: 61, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:33:49,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2167930.0, ans=0.0 2024-08-13 12:33:51,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2167930.0, ans=0.125 2024-08-13 12:33:54,666 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2024-08-13 12:34:30,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2168130.0, ans=0.125 2024-08-13 12:34:42,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2168230.0, ans=0.125 2024-08-13 12:35:02,446 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.480e+01 2.802e+01 3.173e+01 5.254e+01, threshold=5.604e+01, percent-clipped=0.0 2024-08-13 12:35:04,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2168430.0, ans=0.125 2024-08-13 12:35:04,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2168430.0, ans=0.1 2024-08-13 12:35:05,018 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 13950, loss[loss=0.1078, beats_loss=0.01184, ecapa_loss=0.0001523, whisper_loss=0.09445, over 21652.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01074, ecapa_loss=0.000163, whisper_loss=0.09223, over 3890445.10 frames. ], batch size: 87, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:35:12,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2168430.0, ans=0.0 2024-08-13 12:35:16,992 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-13 12:35:20,345 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2024-08-13 12:35:30,560 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 12:35:42,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2168630.0, ans=0.125 2024-08-13 12:35:47,180 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-13 12:35:55,782 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 12:36:07,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2168730.0, ans=0.2 2024-08-13 12:36:08,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2168730.0, ans=0.125 2024-08-13 12:36:12,550 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 12:36:28,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2168830.0, ans=0.1 2024-08-13 12:36:31,144 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 14000, loss[loss=0.09623, beats_loss=0.01107, ecapa_loss=0.0001624, whisper_loss=0.08353, over 22452.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01073, ecapa_loss=0.0001625, whisper_loss=0.09252, over 3885282.86 frames. ], batch size: 90, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:37:05,809 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 12:37:10,067 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-13 12:37:11,164 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-13 12:37:37,659 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 12:37:39,784 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-13 12:37:42,640 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2024-08-13 12:37:44,149 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2024-08-13 12:37:44,421 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.89 vs. limit=22.5 2024-08-13 12:37:45,721 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 15 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 12:37:56,787 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.610e+01 2.869e+01 3.326e+01 4.545e+01, threshold=5.739e+01, percent-clipped=0.0 2024-08-13 12:38:02,086 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 14050, loss[loss=0.08823, beats_loss=0.01091, ecapa_loss=0.0001753, whisper_loss=0.07556, over 14480.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01077, ecapa_loss=0.0001611, whisper_loss=0.0921, over 3843003.36 frames. ], batch size: 58, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:38:10,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2169430.0, ans=0.125 2024-08-13 12:38:20,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2169430.0, ans=0.1 2024-08-13 12:38:22,231 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 12:38:49,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2169630.0, ans=0.125 2024-08-13 12:39:05,660 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 12:39:14,917 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=12.0 2024-08-13 12:39:24,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2169730.0, ans=0.2 2024-08-13 12:39:26,586 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.31 vs. limit=15.0 2024-08-13 12:39:31,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2169830.0, ans=0.1 2024-08-13 12:39:38,382 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-13 12:39:42,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2169830.0, ans=0.0 2024-08-13 12:39:48,338 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 14100, loss[loss=0.1021, beats_loss=0.01238, ecapa_loss=0.0001766, whisper_loss=0.08792, over 21210.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01077, ecapa_loss=0.0001616, whisper_loss=0.09286, over 3841141.95 frames. ], batch size: 89, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:40:11,083 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.46 vs. limit=15.0 2024-08-13 12:40:22,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2170030.0, ans=6.0 2024-08-13 12:40:40,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2170130.0, ans=0.2 2024-08-13 12:40:43,894 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-13 12:41:39,603 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.420e+01 2.685e+01 3.019e+01 4.436e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-13 12:41:45,784 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 14150, loss[loss=0.09766, beats_loss=0.0129, ecapa_loss=0.0001843, whisper_loss=0.08292, over 16169.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01076, ecapa_loss=0.0001617, whisper_loss=0.09238, over 3825558.15 frames. ], batch size: 65, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:41:54,453 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.29 vs. limit=5.0 2024-08-13 12:42:58,872 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 12:43:20,946 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-13 12:43:24,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2170830.0, ans=0.0 2024-08-13 12:43:36,502 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 12:43:41,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2170830.0, ans=0.125 2024-08-13 12:43:46,011 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 14200, loss[loss=0.1254, beats_loss=0.008853, ecapa_loss=0.0001725, whisper_loss=0.1148, over 16718.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01084, ecapa_loss=0.0001606, whisper_loss=0.09182, over 3850007.73 frames. ], batch size: 64, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:43:51,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2170930.0, ans=0.1 2024-08-13 12:44:05,089 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-13 12:44:23,166 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2024-08-13 12:44:24,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2171030.0, ans=0.1 2024-08-13 12:44:38,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2171130.0, ans=0.125 2024-08-13 12:44:52,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2171130.0, ans=0.0 2024-08-13 12:44:52,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2171130.0, ans=0.0 2024-08-13 12:44:55,428 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 12:45:28,423 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 12:45:37,254 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.60 vs. limit=15.0 2024-08-13 12:45:43,938 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.405e+01 2.774e+01 3.077e+01 4.390e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-13 12:45:49,126 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 14250, loss[loss=0.0898, beats_loss=0.01225, ecapa_loss=0.0001639, whisper_loss=0.07592, over 23186.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01079, ecapa_loss=0.0001621, whisper_loss=0.0917, over 3867279.90 frames. ], batch size: 94, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:46:16,442 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-13 12:46:28,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2171630.0, ans=0.125 2024-08-13 12:46:29,824 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.38 vs. limit=6.0 2024-08-13 12:46:38,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2171630.0, ans=0.125 2024-08-13 12:46:39,427 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-13 12:46:46,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2171730.0, ans=0.0 2024-08-13 12:46:54,128 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 12:46:56,812 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2024-08-13 12:46:57,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2171830.0, ans=0.2 2024-08-13 12:47:13,355 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 14300, loss[loss=0.1053, beats_loss=0.01062, ecapa_loss=0.0001504, whisper_loss=0.0932, over 17312.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01085, ecapa_loss=0.0001618, whisper_loss=0.09137, over 3886111.07 frames. ], batch size: 67, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:47:16,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2171930.0, ans=0.0 2024-08-13 12:47:27,694 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 12:47:31,186 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 17 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 12:47:31,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2172030.0, ans=0.125 2024-08-13 12:47:43,511 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 12:47:43,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2172130.0, ans=0.125 2024-08-13 12:47:49,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2172130.0, ans=0.125 2024-08-13 12:48:10,840 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 12:48:25,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2172330.0, ans=0.0 2024-08-13 12:48:25,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2172330.0, ans=0.125 2024-08-13 12:48:30,706 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.493e+01 2.701e+01 3.105e+01 1.229e+02, threshold=5.402e+01, percent-clipped=5.0 2024-08-13 12:48:34,919 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 14350, loss[loss=0.07379, beats_loss=0.01135, ecapa_loss=0.0001476, whisper_loss=0.06096, over 19240.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01081, ecapa_loss=0.0001625, whisper_loss=0.09158, over 3906659.42 frames. ], batch size: 75, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:48:35,344 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-13 12:48:49,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2172530.0, ans=0.0 2024-08-13 12:48:51,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2172530.0, ans=0.07 2024-08-13 12:48:51,597 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.43 vs. limit=12.0 2024-08-13 12:48:52,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=2172530.0, ans=10.0 2024-08-13 12:49:07,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2172630.0, ans=0.015 2024-08-13 12:49:30,641 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-13 12:49:44,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2172830.0, ans=0.125 2024-08-13 12:49:51,480 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 12:49:54,121 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 14400, loss[loss=0.1156, beats_loss=0.01034, ecapa_loss=0.0001575, whisper_loss=0.1037, over 23654.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01076, ecapa_loss=0.0001629, whisper_loss=0.09192, over 3932465.22 frames. ], batch size: 94, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:49:55,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2172930.0, ans=0.125 2024-08-13 12:49:55,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2172930.0, ans=0.2 2024-08-13 12:50:01,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2172930.0, ans=0.09899494936611666 2024-08-13 12:50:10,939 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 12:50:30,835 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-13 12:50:40,940 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 12:50:59,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2173330.0, ans=0.0 2024-08-13 12:51:01,063 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-13 12:51:03,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2173330.0, ans=0.125 2024-08-13 12:51:03,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2173330.0, ans=0.1 2024-08-13 12:51:11,169 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 30 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 12:51:13,878 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.394e+01 2.605e+01 2.942e+01 4.760e+01, threshold=5.210e+01, percent-clipped=0.0 2024-08-13 12:51:17,142 INFO [train_multi_KD3.py:1116] (2/4) Epoch 15, batch 14450, loss[loss=0.1187, beats_loss=0.01133, ecapa_loss=0.0001366, whisper_loss=0.106, over 17104.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01083, ecapa_loss=0.0001614, whisper_loss=0.09169, over 3932899.13 frames. ], batch size: 66, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:51:24,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2173430.0, ans=0.0 2024-08-13 12:51:26,114 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.25 vs. limit=12.0 2024-08-13 12:51:31,371 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 12:51:41,197 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 12:51:41,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2173530.0, ans=0.1 2024-08-13 12:51:44,968 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 21 from LS+wenet, 26 from Vox, 47 fro AS 2024-08-13 12:51:47,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2173630.0, ans=0.125 2024-08-13 12:51:58,235 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 12:52:02,719 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 12:52:46,757 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 12:52:47,919 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 0, loss[loss=0.112, beats_loss=0.009056, ecapa_loss=0.0001481, whisper_loss=0.1015, over 22371.00 frames. ], tot_loss[loss=0.112, beats_loss=0.009056, ecapa_loss=0.0001481, whisper_loss=0.1015, over 22371.00 frames. ], batch size: 88, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:52:47,920 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-13 12:53:01,206 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5013, 3.1211, 2.9945, 3.0766], device='cuda:2') 2024-08-13 12:53:29,339 INFO [train_multi_KD3.py:1149] (2/4) Epoch 16, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005644, whisper_loss=0.2485, over 922467.00 frames. 2024-08-13 12:53:45,300 INFO [train_multi_KD3.py:1149] (2/4) Epoch 16, validation on SV_voxceleb1: loss=0.00454, beats_loss=0, ecapa_loss=0.000454, whisper_loss=0, over 939242.00 frames. 2024-08-13 12:55:41,413 INFO [train_multi_KD3.py:1149] (2/4) Epoch 16, validation on AT_audioset: loss=0.02377, beats_loss=0.02377, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 12:55:41,417 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31859MB 2024-08-13 12:55:57,192 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 12:55:58,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2173810.0, ans=0.125 2024-08-13 12:56:12,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2173910.0, ans=0.125 2024-08-13 12:56:44,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2174010.0, ans=0.125 2024-08-13 12:56:50,341 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-13 12:57:21,389 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.45 vs. limit=6.0 2024-08-13 12:57:28,758 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-13 12:57:38,142 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 12:57:47,598 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 50, loss[loss=0.1121, beats_loss=0.006119, ecapa_loss=0.0002229, whisper_loss=0.1037, over 15332.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01018, ecapa_loss=0.0001676, whisper_loss=0.09073, over 883340.90 frames. ], batch size: 61, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:58:11,445 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.624e+01 2.946e+01 3.270e+01 5.312e+01, threshold=5.891e+01, percent-clipped=1.0 2024-08-13 12:58:41,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2174510.0, ans=0.125 2024-08-13 12:59:27,455 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-13 12:59:41,358 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 12:59:43,499 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 100, loss[loss=0.1154, beats_loss=0.008945, ecapa_loss=0.0001553, whisper_loss=0.1049, over 17106.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0101, ecapa_loss=0.0001676, whisper_loss=0.08919, over 1547475.63 frames. ], batch size: 67, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:59:48,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2174810.0, ans=0.125 2024-08-13 12:59:54,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2174810.0, ans=0.2 2024-08-13 13:00:22,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2174910.0, ans=0.125 2024-08-13 13:01:08,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2175110.0, ans=0.0 2024-08-13 13:01:34,194 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 150, loss[loss=0.08614, beats_loss=0.01277, ecapa_loss=0.0001623, whisper_loss=0.07175, over 19803.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.009994, ecapa_loss=0.000168, whisper_loss=0.0899, over 2055120.79 frames. ], batch size: 81, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 13:01:52,171 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.691e+01 2.914e+01 3.205e+01 4.939e+01, threshold=5.827e+01, percent-clipped=0.0 2024-08-13 13:02:24,174 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-13 13:02:27,271 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-13 13:02:29,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2175610.0, ans=0.0 2024-08-13 13:02:32,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2175610.0, ans=0.125 2024-08-13 13:02:57,009 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 200, loss[loss=0.1076, beats_loss=0.01099, ecapa_loss=0.0001435, whisper_loss=0.09522, over 21593.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01012, ecapa_loss=0.0001675, whisper_loss=0.08975, over 2437816.93 frames. ], batch size: 84, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 13:02:59,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2175810.0, ans=0.1 2024-08-13 13:03:09,433 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-13 13:03:25,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2175910.0, ans=0.0 2024-08-13 13:03:29,481 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 13:03:50,001 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 13:03:51,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2176110.0, ans=0.125 2024-08-13 13:03:56,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2176110.0, ans=0.125 2024-08-13 13:04:02,819 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 13:04:09,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2176210.0, ans=0.125 2024-08-13 13:04:13,406 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 250, loss[loss=0.1199, beats_loss=0.01141, ecapa_loss=0.0001383, whisper_loss=0.1071, over 22742.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01023, ecapa_loss=0.0001653, whisper_loss=0.0908, over 2709979.44 frames. ], batch size: 89, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:04:13,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2176310.0, ans=0.125 2024-08-13 13:04:18,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2176310.0, ans=0.125 2024-08-13 13:04:20,166 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.52 vs. limit=22.5 2024-08-13 13:04:22,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2176310.0, ans=0.0 2024-08-13 13:04:27,514 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.510e+01 2.289e+01 2.601e+01 2.843e+01 4.467e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-13 13:04:32,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2176410.0, ans=0.125 2024-08-13 13:04:41,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2176510.0, ans=0.2 2024-08-13 13:04:42,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2176510.0, ans=0.125 2024-08-13 13:04:47,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2176510.0, ans=0.125 2024-08-13 13:04:56,507 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.56 vs. limit=15.0 2024-08-13 13:04:57,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2176610.0, ans=0.125 2024-08-13 13:04:58,956 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2024-08-13 13:05:15,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2176710.0, ans=0.125 2024-08-13 13:05:25,186 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 300, loss[loss=0.09551, beats_loss=0.01178, ecapa_loss=0.0001741, whisper_loss=0.08199, over 17183.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01033, ecapa_loss=0.0001657, whisper_loss=0.09117, over 2973283.94 frames. ], batch size: 70, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:05:37,120 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.52 vs. limit=15.0 2024-08-13 13:05:44,144 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 13:05:57,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2177010.0, ans=0.125 2024-08-13 13:06:02,755 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 13:06:07,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2177110.0, ans=0.5 2024-08-13 13:06:09,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2177110.0, ans=0.125 2024-08-13 13:06:23,489 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.31 vs. limit=22.5 2024-08-13 13:06:38,056 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 350, loss[loss=0.1009, beats_loss=0.01202, ecapa_loss=0.000146, whisper_loss=0.08741, over 13924.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.0001641, whisper_loss=0.09106, over 3152900.64 frames. ], batch size: 57, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:06:52,653 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.376e+01 2.584e+01 2.917e+01 1.097e+02, threshold=5.167e+01, percent-clipped=1.0 2024-08-13 13:07:14,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2177510.0, ans=0.0 2024-08-13 13:07:14,186 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.041e-02 2024-08-13 13:07:15,867 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2024-08-13 13:07:22,490 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 27 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-13 13:07:41,069 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-13 13:07:41,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2177710.0, ans=0.125 2024-08-13 13:07:51,035 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 400, loss[loss=0.1174, beats_loss=0.01154, ecapa_loss=0.000167, whisper_loss=0.1042, over 22337.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01042, ecapa_loss=0.0001641, whisper_loss=0.09187, over 3305471.42 frames. ], batch size: 88, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:07:51,232 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-13 13:08:01,635 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-13 13:08:04,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2177910.0, ans=0.07 2024-08-13 13:08:05,527 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-13 13:08:06,863 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 23 from Vox, 17 fro AS 2024-08-13 13:08:09,755 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 27 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 13:08:16,316 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.61 vs. limit=10.0 2024-08-13 13:09:02,113 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 450, loss[loss=0.1085, beats_loss=0.01009, ecapa_loss=0.0001601, whisper_loss=0.09679, over 14001.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01038, ecapa_loss=0.0001639, whisper_loss=0.09183, over 3392083.11 frames. ], batch size: 54, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:09:16,312 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.363e+01 2.643e+01 2.945e+01 6.968e+01, threshold=5.285e+01, percent-clipped=1.0 2024-08-13 13:09:16,507 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-13 13:09:22,598 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2024-08-13 13:09:36,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2178510.0, ans=0.125 2024-08-13 13:09:39,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2178510.0, ans=0.0 2024-08-13 13:09:46,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2178610.0, ans=0.0 2024-08-13 13:09:49,058 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 13:09:50,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2178610.0, ans=0.0 2024-08-13 13:09:55,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2178610.0, ans=0.125 2024-08-13 13:09:59,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2178710.0, ans=0.125 2024-08-13 13:10:03,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2178710.0, ans=0.0 2024-08-13 13:10:14,114 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 500, loss[loss=0.1126, beats_loss=0.00925, ecapa_loss=0.0001591, whisper_loss=0.1017, over 22423.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01041, ecapa_loss=0.0001633, whisper_loss=0.09173, over 3489309.63 frames. ], batch size: 89, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:10:20,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2178810.0, ans=0.2 2024-08-13 13:10:27,821 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 13:10:52,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2179010.0, ans=0.1 2024-08-13 13:11:21,871 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 13:11:23,270 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-13 13:11:27,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2179310.0, ans=0.0 2024-08-13 13:11:28,320 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 550, loss[loss=0.1025, beats_loss=0.01207, ecapa_loss=0.0001467, whisper_loss=0.08893, over 23155.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01047, ecapa_loss=0.0001615, whisper_loss=0.09181, over 3582949.09 frames. ], batch size: 92, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:11:36,305 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 30 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 13:11:43,362 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.371e+01 2.596e+01 2.960e+01 4.995e+01, threshold=5.192e+01, percent-clipped=0.0 2024-08-13 13:11:45,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2179410.0, ans=0.0 2024-08-13 13:11:52,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2179410.0, ans=0.125 2024-08-13 13:11:53,736 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 27 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 13:11:58,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2179510.0, ans=0.125 2024-08-13 13:11:59,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2179510.0, ans=0.05 2024-08-13 13:12:02,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2179510.0, ans=0.0 2024-08-13 13:12:10,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2179610.0, ans=0.125 2024-08-13 13:12:22,188 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 13:12:29,296 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 16 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 13:12:32,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2179710.0, ans=0.09899494936611666 2024-08-13 13:12:40,844 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 600, loss[loss=0.1335, beats_loss=0.00884, ecapa_loss=0.0001494, whisper_loss=0.1231, over 17982.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01039, ecapa_loss=0.0001618, whisper_loss=0.0921, over 3635017.84 frames. ], batch size: 65, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:12:59,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2179910.0, ans=0.125 2024-08-13 13:13:09,060 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 13:13:20,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2180010.0, ans=0.09899494936611666 2024-08-13 13:13:31,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2180110.0, ans=0.2 2024-08-13 13:13:33,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2180110.0, ans=0.09899494936611666 2024-08-13 13:13:41,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2180210.0, ans=0.125 2024-08-13 13:13:53,474 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 650, loss[loss=0.09323, beats_loss=0.01167, ecapa_loss=0.0001404, whisper_loss=0.08016, over 19868.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01044, ecapa_loss=0.0001619, whisper_loss=0.09156, over 3679159.47 frames. ], batch size: 78, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:14:08,235 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.428e+01 2.791e+01 3.201e+01 6.340e+01, threshold=5.582e+01, percent-clipped=1.0 2024-08-13 13:14:10,589 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.70 vs. limit=22.5 2024-08-13 13:14:15,754 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-13 13:14:27,566 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 13:14:35,485 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 13:14:40,238 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-08-13 13:15:00,288 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 13:15:01,466 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 13:15:06,856 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 700, loss[loss=0.09548, beats_loss=0.01124, ecapa_loss=0.0001379, whisper_loss=0.08287, over 15626.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01054, ecapa_loss=0.0001618, whisper_loss=0.0917, over 3709743.98 frames. ], batch size: 59, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:15:11,993 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 13:15:12,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2180810.0, ans=0.1 2024-08-13 13:15:22,518 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=22.5 2024-08-13 13:15:26,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2180910.0, ans=0.1 2024-08-13 13:15:40,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2181010.0, ans=0.0 2024-08-13 13:15:41,076 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.15 vs. limit=10.0 2024-08-13 13:15:42,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2181010.0, ans=0.1 2024-08-13 13:15:45,399 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2024-08-13 13:15:47,620 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-13 13:16:03,934 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.38 vs. limit=15.0 2024-08-13 13:16:22,121 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 750, loss[loss=0.09319, beats_loss=0.01165, ecapa_loss=0.0001474, whisper_loss=0.08006, over 16079.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01059, ecapa_loss=0.0001615, whisper_loss=0.09136, over 3724889.35 frames. ], batch size: 63, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:16:23,556 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 18 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 13:16:26,923 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-13 13:16:34,777 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 13:16:37,662 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.319e+01 2.745e+01 2.985e+01 9.286e+01, threshold=5.489e+01, percent-clipped=1.0 2024-08-13 13:16:39,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2181410.0, ans=0.125 2024-08-13 13:16:40,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2181410.0, ans=0.125 2024-08-13 13:16:45,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2181410.0, ans=0.125 2024-08-13 13:17:03,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2181510.0, ans=0.125 2024-08-13 13:17:11,768 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.24 vs. limit=10.0 2024-08-13 13:17:31,428 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.81 vs. limit=15.0 2024-08-13 13:17:33,490 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 13:17:37,910 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 800, loss[loss=0.07861, beats_loss=0.01272, ecapa_loss=0.000126, whisper_loss=0.06462, over 14812.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01071, ecapa_loss=0.0001611, whisper_loss=0.09052, over 3763608.32 frames. ], batch size: 58, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:18:05,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2181910.0, ans=0.0 2024-08-13 13:18:21,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2182110.0, ans=0.2 2024-08-13 13:18:22,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2182110.0, ans=0.0 2024-08-13 13:18:49,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2182210.0, ans=0.0 2024-08-13 13:18:52,704 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 850, loss[loss=0.09108, beats_loss=0.007973, ecapa_loss=0.0001798, whisper_loss=0.08131, over 14628.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01065, ecapa_loss=0.0001616, whisper_loss=0.09053, over 3794458.17 frames. ], batch size: 58, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:18:56,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2182310.0, ans=0.1 2024-08-13 13:18:58,042 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 28 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 13:19:00,339 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2024-08-13 13:19:08,081 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.299e+01 2.538e+01 2.916e+01 7.643e+01, threshold=5.076e+01, percent-clipped=1.0 2024-08-13 13:19:10,120 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 13:19:14,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2182410.0, ans=0.125 2024-08-13 13:19:39,167 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 13:19:53,820 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 39 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-13 13:20:01,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2182710.0, ans=0.125 2024-08-13 13:20:07,954 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 900, loss[loss=0.1233, beats_loss=0.006462, ecapa_loss=0.0002097, whisper_loss=0.1148, over 16021.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.000161, whisper_loss=0.09068, over 3777833.13 frames. ], batch size: 64, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:20:36,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2182910.0, ans=0.2 2024-08-13 13:21:03,052 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=15.0 2024-08-13 13:21:08,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2183110.0, ans=0.2 2024-08-13 13:21:10,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2183110.0, ans=0.125 2024-08-13 13:21:16,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2183210.0, ans=0.125 2024-08-13 13:21:20,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2183210.0, ans=0.125 2024-08-13 13:21:35,564 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 950, loss[loss=0.09196, beats_loss=0.01074, ecapa_loss=0.0001351, whisper_loss=0.07987, over 14729.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.0001601, whisper_loss=0.09054, over 3795653.46 frames. ], batch size: 54, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:21:53,199 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.345e+01 2.599e+01 2.801e+01 4.371e+01, threshold=5.198e+01, percent-clipped=0.0 2024-08-13 13:22:01,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2183410.0, ans=0.0 2024-08-13 13:22:12,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2183510.0, ans=0.125 2024-08-13 13:22:23,471 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 13:22:29,556 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 13:22:42,209 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 33 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 13:22:52,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2183710.0, ans=0.0 2024-08-13 13:23:14,108 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1000, loss[loss=0.1026, beats_loss=0.01178, ecapa_loss=0.000155, whisper_loss=0.08928, over 18771.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01061, ecapa_loss=0.0001596, whisper_loss=0.09045, over 3819990.33 frames. ], batch size: 77, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:23:32,900 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.83 vs. limit=22.5 2024-08-13 13:23:55,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2183910.0, ans=0.125 2024-08-13 13:24:08,477 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=12.0 2024-08-13 13:24:31,107 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-13 13:24:37,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2184110.0, ans=0.125 2024-08-13 13:25:11,348 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.82 vs. limit=10.0